构建一个JPEG解码器(3):霍夫曼表

简述

在上一部分中,我提到大多数 JPEG 文件在图像压缩之上采用了一种编码技术,试图从图像中去除任何冗余信息的痕迹。最常见的 JPEG 编码所使用的技术是对数据压缩世界中常见的一种技术的改编,称为霍夫曼编码,因此详细探索霍夫曼解码器的结构和实现是很有用的。

因为霍夫曼编码是 JPEG 编码器在保存图像文件时执行的最后一件事,所以它需要我们的解码器做的第一件事。这可以通过两种方式实现:

重构

完整图像扫描从其霍夫曼编码状态解码为临时占位符,并对临时副本执行进一步处理。

即时

编码图像扫描一次处理一个代码,并在有足够信息可用时处理每个 JPEG 块。

本文将采用第二种方法,以节省内存和牺牲时间;完全重构可以使用下面构建的代码以非常相似的方式实现。

霍夫曼算法

霍夫曼编码和其他基于熵的方案背后的概念类似于替换密码背后的概念:输入中的每个唯一字符都被转换为唯一的输出字符。最简单的例子是凯撒代换,可以用表格形式表示如下:

A => D
B => E
C => F
...
Y => B
Z => C

This is an example of a Caesar cipher
Wklv lv dq hadpsoh ri d Fdhvdu flskhu

可以通过记录输入中字符的相对频率,并设计一个包含更短代码代替这些字符的表,而不是稀有字符,从而改进标准替换密码。看看上面例子中字母的频率,包括它们的 ASCII 表示,我们可以生成一个增加唯一代码的表,如下所示:

字符ASCII码频率编码
Space00100000700
a01100001501
e011001014100
i0110100131010
s0111001131011
h01101000211000
p01110000211001
r01110010211010
C010000111110110
T010101001110111
c011000111111000
f011001101111001
l011011001111010
m011011011111011
n011011101111100
o011011111111101
x011101111111110

表 1:字符串“这是凯撒密码的一个例子”中字符的频率

用这些编码代替原文中的字符,可以看出编码后的数据比原文小很多。

This is an example of a Caesar cipher
01010100 01101000 01101001 01110011 00100000 01101001 01110011 00100000
01100001 01101110 00100000 01100101 01110111 01100001 01101101 01110000
01101100 01100101 00100000 01101111 01100110 00100000 01100001 00100000
01000011 01100001 01100101 01110011 01100001 01110010 00100000 01100011
01101001 01110000 01101000 01100101 01110010

110111 11000 1010 1011 00 1010 1011 00 01 111100 00 100 111110 01 111011
11011 111010 100 00 111101 111001 00 01 00 110110 01 100 1011 01 11010
00 111000 1010 11001 11000 100 11010

54 68 69 73 20 69 73 20 61 6E 20 65 77 61 6D 70 6C 65 20 6F 66 20 61 20
43 61 65 73 61 72 20 63 69 70 68 65 72


DF 15 65 58 F8 4F 9E F7 D4 3D E4 4D 99 6E 8E 2B 38 9A

霍夫曼编码方法的主要缺点是代码表需要与压缩数据一起存储:在上面的例子中,如果没有相应的频率表,最后一行的编码字节的字符串将毫无意义。代码表及其对应的字符可以完整记录,但如果注意代码出现的模式,则有一种更节省空间的方法来保存代码。这里有两点值得注意:首先,代码长度增加,而且在相同长度的组内,代码是连续的。这意味着代码表可以写成:

2 codes of length two  , starting at 00
1 code  of length three, starting at 100
2 codes of length four , starting at 1010
3 codes of length five , starting at 11000
9 codes of length six  , starting at 110110

仔细观察代码本身可以进一步改进记录编码表所需的空间。如果我们结合上面的代码长度列表来查看代码,我们可以按如下方式开始计数。

00 (zero)
01 (one)
Next code would be 10 (two)
100 (four)
Next code would be 101 (five)
1010 (ten)
1011 (eleven)
Next code would be 1100 (twelve)
11000 (twenty four)
11001 (twenty five)
11010 (twenty six)
Next code would be 11011 (twenty seven)
110110 (fifty four)
110111 (fifty five)
111000 (fifty six)
111001 (fifty seven)
111010 (fifty eight)
111011 (fifty nine)
111100 (sixty)
111101 (sixty one)
111110 (sixty two)

在每种情况下,当为给定的代码长度计算了所需的代码数量时,所需要的只是将计数器加倍并继续下一个代码长度。换句话说,不需要记录上面代码长度列表的“起始于”部分,因为它可以通过从零开始来推断。因此,最终的代码列表如下所示。

2 codes of length two
1 code  of length three
2 codes of length four
3 codes of length five
9 codes of length six

The above codes correspond to the following characters, in this order:
Space,a,e,i,s,h,p,r,C,T,c,f,l,m,n,o,x

JPEG DHT 段

JPEG 文件的霍夫曼表正是以这种方式记录的:一个给定长度(1 到 16 之间)存在多少个代码的列表,然后依次是代码的含义。根据 JPEG 标准,此信息保存在文件的“定义霍夫曼表”(DHT) 段中,其中最多可以有 32 个段。

如上所示,由霍夫曼算法编码的数据最终被记录为在比特流中楔入的一系列代码;这也适用于 JPEG 文件中的图像扫描。从位流中读取代码的简单例程可能如下所示:

用于读取霍夫曼编码值的伪代码:

Code = 0
Length = 0
Found = False

Do
    Code = Code << 1
    Code = Code | (The next bit in the stream)
    Length = Length + 1
    If ((Length, Code) is in the Huffman list) Then
        Found = True
    End If
While Found = False

为了简化这个算法,霍夫曼代码的存储方式应该允许我们确定一个代码是否以给定的长度在地图中。表示霍夫曼代码列表的规范方式是二叉树,其中分支序列定义代码,树的深度告诉我们代码的长度。 C++ STL 为我们将其抽象为映射结构。

由于可以在 JPEG 文件中定义多达 32 个可能的霍夫曼表,因此我们的实现将需要 32 个map可用。此时还值得定义 DHT 段处理程序将如何被本系列前一部分开发的 parseSeg 方法调用。

源代码

jpeg.h

JPEG 类定义

/**
* Let's Build a JPEG Decoder: Segment lister
* JPEG class definition [jpeg.h]
* Imran Nazar, Jan 2013
*/

#ifndef __JPEG_H_
#define __JPEG_H_

#include "inttypes.h"
#include <string>
#include <vector>
#include <map>
#include <stdio.h>

// How to read a 16-bit word from the JPEG file
#define READ_WORD() ((fgetc(fp)<<8)|fgetc(fp))

// Segment parsing error codes
#define JPEG_SEG_ERR  0
#define JPEG_SEG_OK   1
#define JPEG_SEG_EOF -1

class JPEG {
    private:
	    // Names of the possible segments
        std::string segNames[64];

        // Maps to hold the Huffman tables
        typedef std::pair<int, u16> huffKey;
        std::map<huffKey, u8> huffData[32];

	    // The file to be read from, opened by constructor
	    FILE *fp;
	
        // Segment parsing dispatcher
	    int parseSeg();

        // Segment parsing handlers
        int DHT();

    public:
        // Construct a JPEG object given a filename
	    JPEG(std::string);
};

#endif//__JPEG_H_

jpeg.cpp

JPEG 类实现

/**
* Let's Build a JPEG Decoder: Segment lister
* JPEG class implementation [jpeg.cpp]
* Imran Nazar, Jan 2013
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "jpeg.h"

//-------------------------------------------------------------------------
// Function: DHT segment handler (DHT)
// Purpose: Build a Huffman table given data in the file

int JPEG::DHT()
{
    int ctr = 0, i, j;
    u16 code = 0;

    // First byte is the table ID, between 0 and 31
    u8 table = fgetc(fp);
    ctr++;

    // Next sixteen bytes are the counts for each code length
    u8 counts[16];
    for (i = 0; i < 16; i++) {
        counts[i] = fgetc(fp);
        ctr++;
    }

    // Build the Huffman map of (length, code) -> value
    for (i = 0; i < 16; i++) {
        for (j = 0; j < counts[i]; j++) {
            huffData[table][huffKey(i + 1, code)] = fgetc(fp);
            code++;
            ctr++;
        }
        code <<= 1;
    }

    printf("Huffman table #%02X:\n", table);

    std::map<huffKey, u8>::iterator iter;
    for (iter = huffData[table].begin();
         iter != huffData[table].end();
         iter++
    ) {
        printf("    %04X at length %d = %02X\n",
            iter->first.second, iter->first.first, iter->second);
    }

    return ctr;
}

//-------------------------------------------------------------------------
// Function: Parse JPEG file segment (parseSeg)
// Purpose: Retrieves 16-bit block ID from file, dispatches to handlers

int JPEG::parseSeg()
{
    if (!fp) {
        printf("File failed to open.\n");
        return JPEG_SEG_ERR;
    }

    u32 fpos = ftell(fp);
    u16 id = READ_WORD(), size;
    if(id < 0xFFC0)
    {
        printf("Segment ID expected, not found.\n");
	    return JPEG_SEG_ERR;
    }
    
    printf("Found segment at file position %d: %s\n", fpos, segNames[id-0xFFC0].c_str());

    switch (id) {
        // The DHT segment defines a Huffman table. Read in the
        // lengths and build the table; this should leave us at
        // the end of the segment
        case 0xFFC4:
            size = READ_WORD() - 2;
            if (DHT() != size) {
                printf("Unexpected end of DHT segment\n");
                return JPEG_SEG_ERR;
            }
            break;

        // The SOI and EOI segments are the only ones not to have
        // a length, and are always a fixed two bytes long; do
        // nothing to advance the file position
        case 0xFFD9:
            return JPEG_SEG_EOF;
        case 0xFFD8:
            break;

        // An SOS segment has a length determined only by the
        // length of the bitstream; for now, assume it's the rest
        // of the file less the two-byte EOI segment
        case 0xFFDA:
            fseek(fp, -2, SEEK_END);
            break;

        // Any other segment has a length specified at its start,
        // so skip over that many bytes of file
        default:
            size = READ_WORD() - 2;
            fseek(fp, size, SEEK_CUR);
            break;
    }
    
    return JPEG_SEG_OK;
}

//-------------------------------------------------------------------------
// Function: Array initialisation (constructor)
// Purpose: Fill in arrays used by the decoder, decode a file
// Parameters: filename (string) - File to decode

JPEG::JPEG(std::string filename)
{
    // Debug messages used by parseSeg to tell us which segment we're at
    segNames[0x00] = std::string("Baseline DCT; Huffman");
    segNames[0x01] = std::string("Extended sequential DCT; Huffman");
    segNames[0x02] = std::string("Progressive DCT; Huffman");
    segNames[0x03] = std::string("Spatial lossless; Huffman");
    segNames[0x04] = std::string("Huffman table");
    segNames[0x05] = std::string("Differential sequential DCT; Huffman");
    segNames[0x06] = std::string("Differential progressive DCT; Huffman");
    segNames[0x07] = std::string("Differential spatial; Huffman");
    segNames[0x08] = std::string("[Reserved: JPEG extension]");
    segNames[0x09] = std::string("Extended sequential DCT; Arithmetic");
    segNames[0x0A] = std::string("Progressive DCT; Arithmetic");
    segNames[0x0B] = std::string("Spatial lossless; Arithmetic");
    segNames[0x0C] = std::string("Arithmetic coding conditioning");
    segNames[0x0D] = std::string("Differential sequential DCT; Arithmetic");
    segNames[0x0E] = std::string("Differential progressive DCT; Arithmetic");
    segNames[0x0F] = std::string("Differential spatial; Arithmetic");
    segNames[0x10] = std::string("Restart");
    segNames[0x11] = std::string("Restart");
    segNames[0x12] = std::string("Restart");
    segNames[0x13] = std::string("Restart");
    segNames[0x14] = std::string("Restart");
    segNames[0x15] = std::string("Restart");
    segNames[0x16] = std::string("Restart");
    segNames[0x17] = std::string("Restart");
    segNames[0x18] = std::string("Start of image");
    segNames[0x19] = std::string("End of image");
    segNames[0x1A] = std::string("Start of scan");
    segNames[0x1B] = std::string("Quantisation table");
    segNames[0x1C] = std::string("Number of lines");
    segNames[0x1D] = std::string("Restart interval");
    segNames[0x1E] = std::string("Hierarchical progression");
    segNames[0x1F] = std::string("Expand reference components");
    segNames[0x20] = std::string("JFIF header");
    segNames[0x21] = std::string("[Reserved: application extension]");
    segNames[0x22] = std::string("[Reserved: application extension]");
    segNames[0x23] = std::string("[Reserved: application extension]");
    segNames[0x24] = std::string("[Reserved: application extension]");
    segNames[0x25] = std::string("[Reserved: application extension]");
    segNames[0x26] = std::string("[Reserved: application extension]");
    segNames[0x27] = std::string("[Reserved: application extension]");
    segNames[0x28] = std::string("[Reserved: application extension]");
    segNames[0x29] = std::string("[Reserved: application extension]");
    segNames[0x2A] = std::string("[Reserved: application extension]");
    segNames[0x2B] = std::string("[Reserved: application extension]");
    segNames[0x2C] = std::string("[Reserved: application extension]");
    segNames[0x2D] = std::string("[Reserved: application extension]");
    segNames[0x2E] = std::string("[Reserved: application extension]");
    segNames[0x2F] = std::string("[Reserved: application extension]");
    segNames[0x30] = std::string("[Reserved: JPEG extension]");
    segNames[0x31] = std::string("[Reserved: JPEG extension]");
    segNames[0x32] = std::string("[Reserved: JPEG extension]");
    segNames[0x33] = std::string("[Reserved: JPEG extension]");
    segNames[0x34] = std::string("[Reserved: JPEG extension]");
    segNames[0x35] = std::string("[Reserved: JPEG extension]");
    segNames[0x36] = std::string("[Reserved: JPEG extension]");
    segNames[0x37] = std::string("[Reserved: JPEG extension]");
    segNames[0x38] = std::string("[Reserved: JPEG extension]");
    segNames[0x39] = std::string("[Reserved: JPEG extension]");
    segNames[0x3A] = std::string("[Reserved: JPEG extension]");
    segNames[0x3B] = std::string("[Reserved: JPEG extension]");
    segNames[0x3C] = std::string("[Reserved: JPEG extension]");
    segNames[0x3D] = std::string("[Reserved: JPEG extension]");
    segNames[0x3E] = std::string("Comment");
    segNames[0x3F] = std::string("[Invalid]");

    // Open the requested file, keep parsing blocks until we run
    // out of file, then close it.
    fp = fopen(filename.c_str(), "rb");
    if (fp) {
        while(parseSeg() == JPEG_SEG_OK);
        fclose(fp);
    }
    else
    {
        perror("JPEG");
    }
}

inttypes.h

与体系结构无关的整数大小定义

/**
* Let's Build a JPEG Decoder: Segment lister
* Architecture-independent integer size definitions [inttypes.h]
* Imran Nazar, Jan 2013
*/

#ifndef __INTTYPES_H_
#define __INTTYPES_H_

typedef unsigned char  u8;
typedef unsigned short u16;
typedef unsigned int   u32;

typedef signed char  s8;
typedef signed short s16;
typedef signed int   s32;

#endif//__INTTYPES_H_


main.cpp

入口代码

/**
* Let's Build a JPEG Decoder: Segment lister
* Entry point [main.cpp]
* Imran Nazar, Jan 2013
*/

#include "jpeg.h"

int main(int argc, char **argv)
{
    if (argc != 2) {
        printf("Usage: jpegparse <file.jpg>\n");
        return 1;
    }

    std::string in = std::string(argv[1]);

    JPEG j(in);
    return 0;
}

Makefile

CC = g++ -c -g
LD = g++

all: jpegparse

jpegparse: jpeg.o main.o
	$(LD) -o $@ $^

%.o: %.cpp
	$(CC) -o $@ $^

%.cpp: %.h

.PHONY: clean
clean:
	rm -rf jpegparse *.o

和上一部分一样,JPEG 类可以用文件名实例化;如果这样做,上面的代码将产生以下几行的输出:

Found segment at file position 177: Huffman table
Huffman table #00:
    0000 at length 2 = 04
    0002 at length 3 = 02
    0003 at length 3 = 03
    0004 at length 3 = 05
    0005 at length 3 = 06
    0006 at length 3 = 07
    000E at length 4 = 01
    001E at length 5 = 00
    003E at length 6 = 08
    007E at length 7 = 09

接下来:读取比特流

一旦为 JPEG 文件构建了霍夫曼图,就可以对图像扫描进行解码以进行进一步处理。在下一部分中,我将在更广泛的上下文中查看扫描的霍夫曼解码,从图像中读取块并检查它们的转换过程。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值