The frame buffer file will have the general structure like this:
- Plane 1
- Plane 2 (optional)
- Plane 3 (optional)
- ...
- Epilogue
A plane is a complete two dimensional array of pixels. An image may have only a single plane if the pixel is already a packed vector of color values. An image may also split color values for each pixel into their own planes. The color values may be specified in different colorspaces, such as RGB, YUV or even CMYK, specified in the epilogue. The beginning and end of each plane are page-aligned, with page size also specified in the epilogue. When an image is chroma-subsampled, the subsampled planes may have widths and heights independently scaled by a small integer denominator.
After the planes, the epilogue would store a descriptor about how to locate and interpret the planes. The epilogue need only be 32-bit word aligned but not necessarily page aligned, and can share parts of unused memory in the last plane. The epilogue consists of binary-serialized protocol buffer of the frame buffer descriptor, plus a small padding, and end with a 32-bit word-aligned crc32 checksum of the epilogue minus the footer, and a 32-bit footer. The footer's upper 16-bit is the signature 0xffbb, and the lower 16-bit is the size of the entire epilogue including padding and footer. The signature allows detecting the byte order. Since epilogue size is a multiple of 4, the last two bits of the size are zero, so it cannot be confused with the signature. Because epilogue size field is 16-bits, the epilogue is limited to 64KB.
A reader would verify that the frame buffer file size is a multiple of 4, read the last 4 bytes to check the signature and find the epilogue size. It would then read the whole epilogue into memory, check the crc32 of the frame buffer descriptor plus padding, and decode the descriptor.
I'll leave the actual frame buffer descriptor specification for another time. For now, it needs to be able to specify at least the page size, image width and height, colorspace, and the planes. Each plane will specify their own beginning and end file positions, subsample denominator, row (stride) size, and pixel format.
The frame buffer file format shall be used to store only one image, and not be used to specify more complex image compositions such as layering. A scene description language should be used for that purpose instead.