Creating an Image Format

July 31, 2024

While I was working on the CZ# image formats discussed in the last post (and the one before that), I started thinking... what if I made my own image format that uses some of the things I've discovered while working on that project? My thought was that I could end up with something that is both more simple for me to understand (and hopefully for others to understand too), so I can learn more about how formats that use lossy compression (for example JPEG) work, and try my hand at designing some kind of format. I also want to try to improve my writing skills by making a longer, more in-depth blog post!

Planning
Lossy Compression
- Result
Extending the Header
References

Planning

I started by planning how I wanted the format to work. I wanted the header format to be pretty simple, just a few items would be good enough. Most file formats begin with a signature or "magic number", which is a series of bytes that lets software reading the file know what format it is. For my format, I decided to go with something related to my project with friends, Dangoware, as I planned to put the project under it, so I came up with dangoimg. This part doesn't actually matter all that much, but it's good to at least make it unique!^[1]

After the signature I decided to include the width and height of the bitmap as 32-bit unsigned integers. This would allow for plenty of pixels (!), so at least we're not limited in that regard. Following the dimensions, I planned on storing the compression data in a similar manner to the CZ# formats I looked at previously, with the number of compression blocks each of which contains two values: the size of the compressed data and the size of the original data.

When I was done, the header struct looked something like the following:

struct BlockInfo {
    len_compressed: u32,
    len_original: u32,
}

struct Header {
    signature: [u8; 8], // Must be set to b"dangoimg"
    width: u32,
    height: u32,
    block_count: u32,
    blocks: Vec<BlockInfo>,
}

After I was done with this, I was ready to move on to the first big thing I wanted to learn about.

Lossy Compression

Lossless compression can be very complex and intricate, and the best lossless compression algorithms are extremely impressive^[2]. However, I think the general idea of lossless compression is more or less simple (I explored a stupid way to losslessly compress text in a previous post). I think the fact that you don't lose any data in lossless compression schemes makes it easier to reason about how you might go about reducing the size of the data. The same is not the case for lossy compression schemes, where you must decide based on human perception of the data what can be thrown away. If you went through an image and removed every other pixel then stored the image like that, it would compress the image data... but it would look really bad:



Original	Modified

Of course, images don't look like this when lossily compressed. For most image formats, a more complex approach is taken. The most common image compression in use today is JPEG, which you almost certainly know about from either the format itself or from JPEG artifacts caused as an... artifact of the compression. The way JPEG works is to take 8×8 blocks of pixels from each of the image's color channels, run something called the Discrete Cosine Transform (DCT) on them, and then remove some of the data. Normally, running the DCT on an input dataset can be reversed with no loss by running the Inverse DCT (IDCT) on the output. However, if you reduce the accuracy of the resulting matrix, you can obtain an approximation of the data instead of the original (which is what we want!).

Because running a 2-dimensional DCT results in pushing the low frequency portions of the input data into the upper left portion of the output matrix, and the high frequency portions into the lower right section, you can obtain an approximate mathematical representation of the original data that doesn't look bad to humans by reducing the high frequency components. This is accomplished through the use of a Quantization Matrix^[3], which reduces the precision of only the frequencies we visually don't notice as much. This concept is quite complex, so if you didn't get a good idea of it from what I explained that's ok. I recommend reading some of the links I have included in the references to learn more.

Result

The upshot of all that is basically that we can create an approximation of some data in the image which can be more easily compressed than the original data. If we run my implementation of DCT on 8×8 blocks of the image shown above, we get something which looks like this:



Original	Modified

Wow! That certainly looks like JPEG compression to me! On a small image like this the "compressed" version actually does worse size-wise, but on a larger image, the compression can be upwards of 50% better than a PNG of the same image, and comes close to matching a JPEG in quality and size. JPEG actually performs some extra tricks to achieve a better size reduction, but that's not important here. Of course, my compression can be adjusted to not look so... bad. I just made it that bad for illustrative purposes, so you can easily see the 8×8 blocks.

Extending the Header

While designing the lossy part of the image format, I realize it would be nice to have a few more features. Namely, the ability to store both lossless and lossy data, to be able to change the quality of the lossy compression, and also to be able to store more color formats. For this, I added a couple more items to the header. The first was a color format identifier as an 8-bit enum, this allows for the image format to be extended in the future with different color formats. The second was a compression type enum, which allows for uncompressed, losslessly compressed (using techniques from the CZ4 format), and lossy compressed, using the DCT algorithm described above. And finally I added an 8-bit value from 1 → 100 which determines the quality of the compressed image, and allows for decoding using that same quality level. The final header looks like this:

struct BlockInfo {
    len_compressed: u32,
    len_original: u32,
}

pub struct Header {
    magic: [u8; 8],
    width: u32,
    height: u32,
    color_format: ColorFormat,
    compression_type: CompressionType,
    quality: u8,
    block_count: u32,
    blocks: Vec<BlockInfo>,
}

Thank you for reading. I tried to make a more technical post this time, with a lot more description of what was fresh in my mind. Feedback of course is always appreciated.

Oh, and the code for this project can be found in the following repository under the Apache-2.0 and MIT permissive open source licences:

https://github.com/Dangoware/sqp

References

Some other examples of file format signatures are GIF89a for GIF files, %PDF for PDF files, and 0xCAFEBABE for Java bytecode (because what else would it be ;D). Interestingly, PNG files actually have a good reason for having the signature \211PNG\r\n\032\n, it's designed to detect corruption. ↩︎
For example, LZ4 is absolutely crazy fast while achieving decent compression ratios, and Brotli can achieve compression ratios that are mind boggling on certain types of data. ↩︎
Quantization is a big scary term that really just means making a value less precise, snapping it to some lower bound of precision. Read more here: https://en.wikipedia.org/wiki/Quantization_(signal_processing) ↩︎

G2's Blog