Creating an Image Format
July 31, 2024
While I was working on the CZ# image formats discussed in the last post (and the one before that), I started thinking... what if I made my own image format that uses some of the things I've discovered while working on that project? My thought was that I could end up with something that is both more simple for me to understand (and hopefully for others to understand too), so I can learn more about how formats that use lossy compression (for example JPEG) work, and try my hand at designing some kind of format. I also want to try to improve my writing skills by making a longer, more in-depth blog post!
Table of Contents
Planning
I started by planning how I wanted the format to work. I wanted the header
format to be pretty simple, just a few items would be good enough. Most file
formats begin with a signature
or "magic number",
which is a series of bytes that lets software reading the file know what format
it is. For my format, I decided to go with something related to my project with
friends, Dangoware, as I planned to put the
project under it, so I came up with dangoimg. This part doesn't actually
matter all that much, but it's good to at least make it unique![1]
After the signature I decided to include the width and height of the bitmap as
32-bit unsigned integers. This would allow for plenty of pixels
(
When I was done, the header struct looked something like the following:
struct BlockInfo {
len_compressed: u32,
len_original: u32,
}
struct Header {
signature: [u8; 8], // Must be set to b"dangoimg"
width: u32,
height: u32,
block_count: u32,
blocks: Vec<BlockInfo>,
}
After I was done with this, I was ready to move on to the first big thing I wanted to learn about.
Lossy Compression
Lossless compression can be very complex and intricate, and the best lossless compression algorithms are extremely impressive[2]. However, I think the general idea of lossless compression is more or less simple (I explored a stupid way to losslessly compress text in a previous post). I think the fact that you don't lose any data in lossless compression schemes makes it easier to reason about how you might go about reducing the size of the data. The same is not the case for lossy compression schemes, where you must decide based on human perception of the data what can be thrown away. If you went through an image and removed every other pixel then stored the image like that, it would compress the image data... but it would look really bad:
![]() |
![]() |
Original |
Modified |
Of course, images don't look like this when lossily compressed. For most image formats, a more complex approach is taken. The most common image compression in use today is JPEG, which you almost certainly know about from either the format itself or from JPEG artifacts caused as an... artifact of the compression. The way JPEG works is to take 8×8 blocks of pixels from each of the image's color channels, run something called the Discrete Cosine Transform (DCT) on them, and then remove some of the data. Normally, running the DCT on an input dataset can be reversed with no loss by running the Inverse DCT (IDCT) on the output. However, if you reduce the accuracy of the resulting matrix, you can obtain an approximation of the data instead of the original (which is what we want!).
Because running a 2-dimensional DCT results in pushing the low frequency portions of the input data into the upper left portion of the output matrix, and the high frequency portions into the lower right section, you can obtain an approximate mathematical representation of the original data that doesn't look bad to humans by reducing the high frequency components. This is accomplished through the use of a Quantization Matrix[3], which reduces the precision of only the frequencies we visually don't notice as much. This concept is quite complex, so if you didn't get a good idea of it from what I explained that's ok. I recommend reading some of the links I have included in the references to learn more.
Result
The upshot of all that is basically that we can create an approximation of some data in the image which can be more easily compressed than the original data. If we run my implementation of DCT on 8×8 blocks of the image shown above, we get something which looks like this:
![]() |
![]() |
Original |
Modified |
Wow! That certainly looks like JPEG compression to me! On a small image like this the "compressed" version actually does worse size-wise, but on a larger image, the compression can be upwards of 50% better than a PNG of the same image, and comes close to matching a JPEG in quality and size. JPEG actually performs some extra tricks to achieve a better size reduction, but that's not important here. Of course, my compression can be adjusted to not look so... bad. I just made it that bad for illustrative purposes, so you can easily see the 8×8 blocks.
Extending the Header
While designing the lossy part of the image format, I realize it would be nice to have a few more features. Namely, the ability to store both lossless and lossy data, to be able to change the quality of the lossy compression, and also to be able to store more color formats. For this, I added a couple more items to the header. The first was a color format identifier as an 8-bit enum, this allows for the image format to be extended in the future with different color formats. The second was a compression type enum, which allows for uncompressed, losslessly compressed (using techniques from the CZ4 format), and lossy compressed, using the DCT algorithm described above. And finally I added an 8-bit value from 1 → 100 which determines the quality of the compressed image, and allows for decoding using that same quality level. The final header looks like this:
struct BlockInfo {
len_compressed: u32,
len_original: u32,
}
pub struct Header {
magic: [u8; 8],
width: u32,
height: u32,
color_format: ColorFormat,
compression_type: CompressionType,
quality: u8,
block_count: u32,
blocks: Vec<BlockInfo>,
}
Thank you for reading. I tried to make a more technical post this time, with a lot more description of what was fresh in my mind. Feedback of course is always appreciated.
Oh, and the code for this project can be found in the following repository under the Apache-2.0 and MIT permissive open source licences:
https://github.com/Dangoware/sqp
References
- https://cs.stanford.edu/people/eroberts/courses/soco/projects/data-compression/lossy/jpeg/index.htm
- https://en.wikipedia.org/wiki/JPEG
- https://en.wikipedia.org/wiki/Discrete_cosine_transform
Some other examples of file format signatures are
GIF89afor GIF files,%PDFfor PDF files, and0xCAFEBABEfor Java bytecode (because what else would it be ;D). Interestingly, PNG files actually have a good reason for having the signature\211PNG\r\n\032\n, it's designed to detect corruption. ↩︎For example, LZ4 is absolutely crazy fast while achieving decent compression ratios, and Brotli can achieve compression ratios that are mind boggling on certain types of data. ↩︎
Quantization is a big scary term that really just means making a value less precise, snapping it to some lower bound of precision. Read more here: https://en.wikipedia.org/wiki/Quantization_(signal_processing) ↩︎


