Each step of decoding a PNG

On ·

Let's look at the complete process of decoding a PNG file. Here's the image I'll be using to demonstrate:

Several letter 'S's on top of each other, each with a different color.

Chunking

PNG files are stored as a series of chunks. Each chunk has a 4 byte length, 4 byte name, the actual data, and a CRC. Most of the chunks contain metadata about the image, but the IDAT chunk contains the actual compressed image data. The capitalization of each letter of the chunk name determines how unexpected chunks should be handled:

UPPERCASElowercase
CriticalAncillaryIs this chunk required to display the image correctly?
PublicPrivateIs this a private, application-specific, chunk?
ReservedReserved
Unsafe to copySafe to copyDo programs that modify the PNG data need to understand this chunk?
c: Ancillary, H: Public, R: Reserved, M: Unsafe to copy

Here are the chunks in the image:

Signature
This identifies the image as a PNG. It contains newline characters and other characters that often get mangled to ensure that the binary image data didn't get corrupted.
89 50 4E 47 0D 0A 1A 0A
Header
This contains metadata about the image.
Length 00 00 00 0D = 13 bytes long
Type 49 48 44 52 = IHDR
Data 00 00 01 00 00 00 01 00 08 02 00 00 00
Width256
Height256
Bit depth8
Color spaceTruecolor RGB
Compression method0
Fitler method0
Interlacingdisabled
CRC D3 10 3F 31
Gamma
This contains the image gamma, used to display more accurate colours.
Length 00 00 00 04 = 4 bytes long
Type 67 41 4D 41 = gAMA
Data 00 00 B1 8F = 45455
CRC 0B FC 61 05
Color space information
This contains data about where in the full CIE color space the colors in the image are, so that monitors that support colors outside the standard sRGB space can display the image better.
Length 00 00 00 20 = 32 bytes long
Type 63 48 52 4D = cHRM
Data 00 00 7A 26 00 00 80 84 00 00 FA 00 00 00 80 E8 00 00 75 30 00 00 EA 60 00 00 3A 98 00 00 17 70
CRC 9C BA 51 3C
Physical dimensions
This contains the physical dimensions of the image, so it can be displayed at the right physical size when possible.
Length 00 00 00 09 = 9 bytes long
Type 70 48 59 73 = pHYs
Data 00 00 0B 13 00 00 0B 13 01 2835 pixels = 1 metre
CRC 00 9A 9C 18
Last modification date
This contains the time the image was created at.
Length 00 00 00 07 = 7 bytes long
Type 74 49 4D 45 = tIME
Data 07 E5 05 0A 0A 2F 2C 2021-05-10, 10:47:44 (UTC)
CRC 00 53 9E DD
Background color
This contains the background color of the image to be displayed while the image is loading
Length 00 00 00 06 = 6 bytes long
Type 62 4B 47 44 = bKGD
Data 00 FF 00 FF 00 FF = white
CRC A0 BD A7 93
Image data
This contains the compressed image data.
Length 00 00 F5 51 = 62801 bytes long
Type 49 44 41 54 = IDAT
Data ... (the compressed and filtered bytes of the image)
CRC BF EB 1B 15
Text
This can store arbitrary tagged textual data about the image.
Length 00 00 00 27 = 39 bytes long
Type 74 45 58 74 = tEXt
Data 46 69 6C 65 00 2F 68 6F 6D 65 2F 73 6D 69 74 2F 50 69 63 74 75 72 65 73 2F 69 63 6F 6E 33 2F 33 64 2E 62 6C 65 6E 64 = key: "File", value: "/home/smit/Pictures/icon3/3d.blend"
CRC 88 DA 55 E7
Text
This can store arbitrary tagged textual data about the image.
Length 00 00 00 13 = 19 bytes long
Type 74 45 58 74 = tEXt
Data 52 65 6E 64 65 72 54 69 6D 65 00 30 30 3A 31 32 2E 31 35 = key: "RenderTime", value: "00:12.15"
CRC 5E 2F 7A B4
Trailer
This empty chunk indicates the end of the PNG file.
Length00 00 00 00 = 0 bytes long
Type49 45 4E 44 = IEND
CRCAE 42 60 82

Extracting the image data

We take all of the IDAT chunks and concatenate them together. In this image, there is only one IDAT chunk. Some images have multiple IDAT chunks, and the contents are concatenated together by the PNG decoder. This is so streaming encoders don't need to know the total data length up front since the data length is at the beginning of each chunk. Multiple IDAT chunks are also needed for encoding images with image data having a length longer than the largest possible chunk size (2554) Here's what we get if treat the compressed data as raw image data:

Random noise on the top third, with the rest black.

It looks like random noise, and that means that the compression algorithm did a good job. Compressed data should look higher-entropy than the lower-entropy data it is encoding. Also, it's less than a third of the height of the actual image: that's some good compression!

Decompressing

The most important chunk is the IDAT chunk, which contains the actual image data. To get the filtered image data, we concatenate the data in the IDAT chunks, and decompress it using zlib. There is no image-specific compression mechanism in play here: just normal zlib compression.

The example image, increasingly skewed to right from top to bottom. It is mostly black and white, with pixels of intense color scattered throughout.

Aside from the colors looking all wrong, the image also appears to be skewed horizontally. This is because each line of the image has a filter as the first byte. Filters don't directly reduce the image size, but get it into a form that is more compressible by zlib. I have written about PNG filters before.

Defiltering

We take the decompressed data and undo the filter on each line. This gets us the decoded image, the same as the original! Here's the popularity of each filter type:

None 0
Subtract 23
Up 77
Average 134
Paeth 22
The example image

Interlacing

PNGs can optionally be interlaced, which splits the image into 7 different images, each of which covers a non-overlapping area of the image:

Diagram of the 7 passes in an 8x8 area. The first is a single pixel in the top-left, and the seventh is every other row.

Each of the 7 images is loaded in sequence, adding more detail to the image as it is loaded. This sequence of images is called Adam7.

The data for each of the 7 images are stored after each other in the image file. If we take the example image and enable interlacing, here's the raw uncompressed image data:

Same as the earlier skewed letter S, but there are multiple stacked on top of each other. The bottom half has 2, the third above that has 2 more, and above that it increasingly looks like random noise.

Here are the 7 passes that we can extract from that image data, which look like downscaled versions of the image:

Since some of those sub-images are rectangles but the actual image is square, there will be more details horizontally than vertically when loading the image, since horizontal detail is more important than vertical detail for human visual perception.

Bonus: bugs

Here's what you get when you have a bug in the Average filter that causes it to treat overflows incorrectly (the way integer overflow in filter value calculation is specified to be a bit different than the rest):

The example image, but it looks glitchy starting a quarter of the way down.