JPEG: Image compression algorithm

Alex Townsend, March 2017

Contents

What is JPEG?

JPEG stands for Joint Photographic Experts Group, which was a group of image processing experts that devised a standard for compressing images (ISO).

So, JPEG (or JPG) is not really a file format but rather an image compression standard. The JPEG standard is complicated with many different options and color space regulations. It was not widely adopted. A much simpler standard version was advocated at the same time, called JFIF. This is the image compression algorithm that most people mean when they say JPEG compression, and the one that we will be describing in this class. Note that the file extensions .jpeg and .jpg have stuck, even though the underneath algorithm is (strictly speaking) JFIF compression.

The underlying assumptions of the JPEG algorithm

The JPEG algorithm is designed specifically for the human eye. It exploits the following biological properties of human sight:

(1) We are more sensitive to the illuminocity of color, rather than the chromatric value of an image, and

(2) We are not particularly sensitive to high-frequency content in images.

The algorithm can be neatly broken up into several stages: There is an input image I, which goes through the following process:

1) A colour transform, 2) A 2D discrete cosine transform on 8x8 blocks, 3) A quantization (filtering) stage, 4) Huffman encoding.

Finally, a compressed image is returned in the .jpg file format. This format contains the compressed image as well as information that is needed to uncompressed, with other information to allow for reexpanding the image.

An uncompressed image of mixed peppers

MATLAB has various images uploaded into MATLAB. Here, is one of some peppers:

I = imread('peppers.png');
imshow(I)

It currently requires about the following number of bits to store:

ImageSize = 8*prod(size(I))
ImageSize =
     4718592

Color transform: Convert RGB to YCbCr

Right now, the image is stored in RGB format. While this colorspace is convenient for projecting the image on the computer screen, it does not isolate the illuminance and color of an image. The intensity of color is intermixed in the colorspace. The YCbCr is a more convenitent colorspace for image compression because it separates the illuminance and the chromatic strength of an image.

Y = I;
A = [.229 .587 .114 ; -.168736 -.331264 .5 ; .5 -.418688 -.081312];
Y(:,:,1) = A(1,1)*I(:,:,1) + A(1,2)*I(:,:,2) + A(1,3)*I(:,:,3);
Y(:,:,2) = A(2,1)*I(:,:,1) + A(2,2)*I(:,:,2) + A(2,3)*I(:,:,3) + 128;
Y(:,:,3) = A(3,1)*I(:,:,1) + A(3,2)*I(:,:,2) + A(3,3)*I(:,:,3) + 128;

% Let's use MATLAB's inbuilt convert (because of the normalizations):
Y = rgb2ycbcr(I);

Plot Y'CbCr colorspace

Let's see what that colorspace looks like.

lb = {'Y', 'Cb', 'Cr'};

for channel = 1:3
    subplot(1,3,channel)
    Y_C = Y;
    Y_C(:,:,setdiff(1:3,channel)) = intmax(class(Y_C))/2;
    imshow(ycbcr2rgb(Y_C))
    title([lb{channel} ' component'],'fontsize',16)
end

Our eyes are senstitive to illuminance, but not color

Since our eyes are not particularly sensitive to chrominance, we can "downsample" that stuff. Here, we remove x100 amount of "color" from the image and see that it has barely changed:

subplot(1,2,1)
imshow( I )
title('Original')
subplot(1,2,2)
Y_d = Y;
Y_d(:,:,2) = 10*round(Y_d(:,:,2)/10);
Y_d(:,:,3) = 10*round(Y_d(:,:,3)/10);
imshow(ycbcr2rgb(Y_d))
title('Downsampled image')

Eyes are senstitive to intensity

However, if we downsample the illuminance by x10, then there is a noticeable difference. (You'll have to zoom in to see it.)

subplot(1,2,1)
imshow( I )
title('original')
subplot(1,2,2)
Y_d = Y;
Y_d(:,:,1) = 10*round(Y_d(:,:,1)/10);
imshow(ycbcr2rgb(Y_d))
title('Downsample illuminance')

JPEG downsampling

Here, we will be a little conservative and downsample the chrominance by only a factor of 2.

Y_d = Y;
Y_d(:,:,2) = 2*round(Y_d(:,:,2)/2);
Y_d(:,:,3) = 2*round(Y_d(:,:,3)/2);

The 2D discrete cosine transform

Once the image is in YCrCb color space and downsampled, it is partitioned into 8x8 blocks. Each block is transformed by the two-dimensional discrete cosine transform (DCT). Let's extract one 8x8 block of pixels for demonstration, shown here in white:

We apply the DCT to that box:

clf
box = Y_d;
II = box(200:207,200:207,1);
Y = chebfun.dct(chebfun.dct(II).').';
surf(log10(abs(Y))), title('DCT coefficients')
set(gca,'fontsize',16)

Quantization table

Next we apply a quantization table to Y, which filters out the high frequency DCT coefficients:

Q = [16 11 10 16 24 40 51 61 ;
     12 12 14 19 26 28 60 55 ;
     14 13 16 24 40 57 69 56 ;
     14 17 22 29 51 87 80 62 ;
     18 22 37 56 68 109 103 77 ;
     24 35 55 64 81 104 113 92 ;
     49 64 78 87 103 121 120 101;
     72 92 95 98 112 100 103 99];
before = nnz(Y);
Y = Q.*round(Y./Q);
after = nnz(Y);

The number of nonzero DCT coefficients after quantization is:

before, after
before =
    64
after =
     6

We now apply this compression to every 8x8 block. We obtain:

I = imread('peppers.png');
Y_d = rgb2ycbcr( I );
% Downsample:
Y_d(:,:,2) = 2*round(Y_d(:,:,2)/2);
Y_d(:,:,3) = 2*round(Y_d(:,:,3)/2);
% DCT compress:
A = zeros(size(Y_d));
B = A;
for channel = 1:3
    for j = 1:8:size(Y_d,1)-7
        for k = 1:8:size(Y_d,2)
            II = Y_d(j:j+7,k:k+7,channel);
            freq = chebfun.dct(chebfun.dct(II).').';
            freq = Q.*round(freq./Q);
            A(j:j+7,k:k+7,channel) = freq;
            B(j:j+7,k:k+7,channel) = chebfun.idct(chebfun.idct(freq).').';
        end
    end
end

subplot(1,2,1)
imshow(ycbcr2rgb(Y_d))
title('Original')
subplot(1,2,2)
imshow(ycbcr2rgb(uint8(B)));
title('Compressed')
shg

Compression so far

The quantization step has successfully compressed the image by about a factor of 7.

CompressedImageSize = 8*nnz(A(:,:,1)) + 7*nnz(A(:,:,2)) + 7*nnz(A(:,:,3))
CompressedImageSize/ImageSize
CompressedImageSize =
      701189
ans =
   0.148601320054796

The formula above is obtained by noting that we downsampled in Cr and Cb are downsampled.

Huffman encoding

We now encode the DCT blocks using Huffman encoding:

b = A(:);
b = b(:);
b(b==0)=[];  %remove zeros.
b = floor(255*(b-min(b))/(max(b)-min(b)));
symbols = unique(b);
prob = histcounts(b,length(symbols))/length(b);
dict = huffmandict(symbols, prob);
enco = huffmanenco(b, dict);
FinalCompressedImage = length(enco)
FinalCompressedImage =
      695755

Final compression

We have successfully reduced the pepper image by about x7, while being extremely conservative:

length(enco)/ImageSize   % x7 compression.
ans =
   0.147449705335829

A different image

The images of peppers is not ideal for JPEG. Here is an image for which JPEG gets a better compression rate (of about x30).

I = imread('saturn.png');
ImageSize = 8*prod(size(I));
Y_d = rgb2ycbcr( I );
% Downsample:
Y_d(:,:,2) = 2*round(Y_d(:,:,2)/2);
Y_d(:,:,3) = 2*round(Y_d(:,:,3)/2);
% DCT compress:
A = zeros(size(Y_d));
B = A;
for channel = 1:3
    for j = 1:8:size(Y_d,1)-7
        for k = 1:8:size(Y_d,2)-7
            II = Y_d(j:j+7,k:k+7,channel);
            freq = chebfun.dct(chebfun.dct(II).').';
            freq = Q.*round(freq./Q);
            A(j:j+7,k:k+7,channel) = freq;
            % do the inverse at the same time:
            B(j:j+7,k:k+7,channel) = chebfun.idct(chebfun.idct(freq).').';
        end
    end
end
b = A(:);
b = b(:);
b(b==0)=[];  %remove zeros.
b = floor(255*(b-min(b))/(max(b)-min(b)));
symbols = unique(b);
prob = histcounts(b,length(symbols))/length(b);
dict = huffmandict(symbols, prob);
enco = huffmanenco(b, dict);
FinalCompressedImage = length(enco);

FinalCompressedImage/ImageSize
ans =
   0.031958287037037

Here's what the images look like:

subplot(1,2,1)
imshow(I)
title('Original')
subplot(1,2,2)
imshow(ycbcr2rgb(uint8(B)));
title('Compressed')