Each step of decoding a PNG

Let's look at the complete process of decoding a PNG file. Here's the image I'll be using to demonstrate:

Several letter 'S's on top of each other, each with a different color.

Chunking

PNG files are stored as a series of chunks. Each chunk has a 4 byte length, 4 byte name, the actual data, and a CRC. Most of the chunks contain metadata about the image, but the IDAT chunk contains the actual compressed image data. The capitalization of each letter of the chunk name determines how unexpected chunks should be handled:

UPPERCASE	lowercase
Critical	Ancillary	Is this chunk required to display the image correctly?
Public	Private	Is this a private, application-specific, chunk?
Reserved	Reserved
Unsafe to copy	Safe to copy	Do programs that modify the PNG data need to understand this chunk?

c: Ancillary, H: Public, R: Reserved, M: Unsafe to copy

Here are the chunks in the image:

Show metadata chunks

Signature

This identifies the image as a PNG. It contains newline characters and other characters that often get mangled to ensure that the binary image data didn't get corrupted.

89 50 4E 47 0D 0A 1A 0A

Header

This contains metadata about the image.

Length 00 00 00 0D = 13 bytes long

Type 49 48 44 52 = IHDR

Data 00 00 01 00 00 00 01 00 08 02 00 00 00

Width	256
Height	256
Bit depth	8
Color space	Truecolor RGB
Compression method	0
Fitler method	0
Interlacing	disabled

CRC D3 10 3F 31

Gamma

This contains the image gamma, used to display more accurate colours.

Length 00 00 00 04 = 4 bytes long

Type 67 41 4D 41 = gAMA

Data 00 00 B1 8F = 45455

CRC 0B FC 61 05

Color space information

This contains data about where in the full CIE color space the colors in the image are, so that monitors that support colors outside the standard sRGB space can display the image better.

Length 00 00 00 20 = 32 bytes long

Type 63 48 52 4D = cHRM

Data 00 00 7A 26 00 00 80 84 00 00 FA 00 00 00 80 E8 00 00 75 30 00 00 EA 60 00 00 3A 98 00 00 17 70

CRC 9C BA 51 3C

Physical dimensions

This contains the physical dimensions of the image, so it can be displayed at the right physical size when possible.

Length 00 00 00 09 = 9 bytes long

Type 70 48 59 73 = pHYs

Data 00 00 0B 13 00 00 0B 13 01 2835 pixels = 1 metre

CRC 00 9A 9C 18

Last modification date

This contains the time the image was created at.

Length 00 00 00 07 = 7 bytes long

Type 74 49 4D 45 = tIME

Data 07 E5 05 0A 0A 2F 2C 2021-05-10, 10:47:44 (UTC)

CRC 00 53 9E DD

Background color

This contains the background color of the image to be displayed while the image is loading

Length 00 00 00 06 = 6 bytes long

Type 62 4B 47 44 = bKGD

Data 00 FF 00 FF 00 FF = white

CRC A0 BD A7 93

Image data

This contains the compressed image data.

Length 00 00 F5 51 = 62801 bytes long

Type 49 44 41 54 = IDAT

Data ... (the compressed and filtered bytes of the image)

CRC BF EB 1B 15

Text

This can store arbitrary tagged textual data about the image.

Length 00 00 00 27 = 39 bytes long

Type 74 45 58 74 = tEXt

Data 46 69 6C 65 00 2F 68 6F 6D 65 2F 73 6D 69 74 2F 50 69 63 74 75 72 65 73 2F 69 63 6F 6E 33 2F 33 64 2E 62 6C 65 6E 64 = key: "File", value: "/home/smit/Pictures/icon3/3d.blend"

CRC 88 DA 55 E7

Text

This can store arbitrary tagged textual data about the image.

Length 00 00 00 13 = 19 bytes long

Type 74 45 58 74 = tEXt

Data 52 65 6E 64 65 72 54 69 6D 65 00 30 30 3A 31 32 2E 31 35 = key: "RenderTime", value: "00:12.15"

CRC 5E 2F 7A B4

Trailer

This empty chunk indicates the end of the PNG file.

Length00 00 00 00 = 0 bytes long

Type49 45 4E 44 = IEND

CRCAE 42 60 82

Extracting the image data

We take all of the IDAT chunks and concatenate them together. In this image, there is only one IDAT chunk. Some images have multiple IDAT chunks, and the contents are concatenated together by the PNG decoder. This is so streaming encoders don't need to know the total data length up front since the data length is at the beginning of each chunk. Multiple IDAT chunks are also needed for encoding images with image data having a length longer than the largest possible chunk size (255⁴) Here's what we get if treat the compressed data as raw image data:

Random noise on the top third, with the rest black.

It looks like random noise, and that means that the compression algorithm did a good job. Compressed data should look higher-entropy than the lower-entropy data it is encoding. Also, it's less than a third of the height of the actual image: that's some good compression!

Decompressing

The most important chunk is the IDAT chunk, which contains the actual image data. To get the filtered image data, we concatenate the data in the IDAT chunks, and decompress it using zlib. There is no image-specific compression mechanism in play here: just normal zlib compression.

The example image, increasingly skewed to right from top to bottom. It is mostly black and white, with pixels of intense color scattered throughout.

Aside from the colors looking all wrong, the image also appears to be skewed horizontally. This is because each line of the image has a filter as the first byte. Filters don't directly reduce the image size, but get it into a form that is more compressible by zlib. I have written about PNG filters before.

Defiltering

We take the decompressed data and undo the filter on each line. This gets us the decoded image, the same as the original! Here's the popularity of each filter type:

None 0

Subtract 23

Up 77

Average 134

Paeth 22

Interlacing

PNGs can optionally be interlaced, which splits the image into 7 different images, each of which covers a non-overlapping area of the image:

Each of the 7 images is loaded in sequence, adding more detail to the image as it is loaded. This sequence of images is called Adam7.

The data for each of the 7 images are stored after each other in the image file. If we take the example image and enable interlacing, here's the raw uncompressed image data:

Same as the earlier skewed letter S, but there are multiple stacked on top of each other. The bottom half has 2, the third above that has 2 more, and above that it increasingly looks like random noise.

Here are the 7 passes that we can extract from that image data, which look like downscaled versions of the image:

Since some of those sub-images are rectangles but the actual image is square, there will be more details horizontally than vertically when loading the image, since horizontal detail is more important than vertical detail for human visual perception.

Bonus: bugs

Here's what you get when you have a bug in the Average filter that causes it to treat overflows incorrectly (the way integer overflow in filter value calculation is specified to be a bit different than the rest):

The example image, but it looks glitchy starting a quarter of the way down.