HuffYUV (HFYU) 编解码器说明（english version）

最新推荐文章于 2021-11-16 23:36:04 发布
baymoon
最新推荐文章于 2021-11-16 23:36:04 发布
阅读量3.6k
点赞数
分类专栏： English Studying 文章标签： compression table stream image documentation codec
English Studying 专栏收录该内容
21 篇文章 0 订阅
订阅专栏
Description of the HuffYUV (HFYU) Codec
by Roberto Togni (rtogni at bresciaonline dot it)
v0.8: March 24, 2003


=======================================================================
NOTE: The information in this document is now maintained in Wiki format
at:
  http://wiki.multimedia.cx/index.php?title=HuffYUV
=======================================================================


  Copyright (c) 2002-2003 Roberto Togni
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  A copy of the license is included in the section entitled "GNU
  Free Documentation License".


Huffyuv v2.1.1, and previous version are written by Ben Rudiak-Gould.
http://www.math.berkeley.edu/~benrg/huffyuv.html


Introduction
============
HuffYUV is a lossless video compressor for AVI files written by Ben 
Rudiak-Gould. It's distributed in form of a vfw (Video for Windows) dll.
Source code is available and distributed (mostly) under GPL.
This codec was created to temporary store video data coming from a 
capture card for later editing, so its main features are:
- speed: uses simple predictors and original version is heavily optimized.
- quality: it's lossless, so image is not degraded if read/written many times.
- editable: every frame is a key frame, there is no temporal prediction.


Tech notes from author's page (referring to v 2.1.1)
====================================================
Huffyuv's algorithm is roughly the same as lossless JPEG: it predicts each 
sample and Huffman-encodes the error.  The predictor functions are:
"left" (predicts the previous sample from the same channel)
"gradient" (predicts Left+Above-AboveLeft)
"median" (predicts the median of Left, Above, and the gradient predictor).
Channels are compressed separately, but in RGB mode the channels used are 
actually R-G, G, and B-G. This yields much better compression than R, G, B.

The error signal in each channel is encoded with its own Huffman table.
On compression Huffyuv picks appropriate tables from its built-in collection.
These tables are then stored in the output file and used when decompressing.
This way future versions of Huffyuv can decompress old files without my 
having to explicitly support old tables.  A Huffyuv-savvy application can 
also specify the Huffman tables to be used for compression instead of 
accepting the defaults.



File Header
===========
All informations about the image format and predictor used are stored into 
the BITMAPINFOHEADER structure of avi header.

BITMAPINFOHEADER layout (only interesting fields are shown)
-----------------------------------------------------------
HUFFYUV_BITMAPINFOHEADER {
 int biSize
 [...]
 short biBitCount
 int biCompression
 [...]
 Extradata {
  BYTE method
  BYTE bpp_override
  BYTE unused[2]  
 }
 BYTE Compressed_Huffmantables[]
}

Extradata and Compressed_Huffmantables fields are not always there.

biCompression contains "HFYU" fourcc.

biSize is the size of the whole structure, and it's used to find out if extra 
fields are there, since size of standard BITMAPINFOHEADER is known.

To find out image format you need to get true bit count: it is stored in 
bpp_override if biSize > sizeof(BITMAPINFOHEADER), else (or if bpp_override 
is 0) it's stored in biBitCount.
Then you can find out image format with this table

+-----+-------+
| bit | image |
+-----+-------+
| 16  | YUY2  |
| 24  | RGB   |
| 32  | RGBA  |
+-----+-------+

In the rest of this document YUV will indicate image of type YUY2, and RGB 
will be used for both RGB and RGBA.

To find out the prediction method you have to look at the 3 LSbit of 
biBitCount (biBitCount & 0x07) and use this table

+-----+-------------+
| bit |   method    |
+-----+-------------+
| 000 | see below   |
| 001 | left        |
| 010 | left decorr |
| 011 | gradient *  |
| 100 | median      |
+-----+-------------+
*gradient if YUV, gradient with decorrelation if RGB.

If (biBitCount & 0x07) is 0, method is stored in extradata if biSize >
sizeof(BITMAPINFOHEADER), else method is predict_old (the method used in 
version 1.x, it's the same as predict left).
To find out method from extradata method byte use this table

+------+-----------------+
| byte |   method        |
+------+-----------------+
|  -2  | old             |
|   0  | left            |
|   1  | gradiend        |
|   2  | median          |
|  64  | left decorr     |
|  65  | gradient decorr |
+------+-----------------+


Huffman tables can be stored in file (v2.x) or not (v1.x).
If biSize = sizeof(BITMAPINFOHEADER) tables are not stored into file, and to 
decode data you have to use classic tables (known by the decoder) choosing 
RGB or YUV as appropriate.
Else tables are stored in file: if (biBitCount & 0x07) = 0 tables start at 
the end of BITMAPINFOHEADER, else they start after Extradata.



Huffman tables
--------------
Huffman tables are stored in file compressed with a run-length algorithm.
This is what Ben Rudiak-Gould says about it in source code (tables.cpp)

// The Huffman tables are run-length encoded.  (I could have Huffman-
// encoded them, but the madness has to end somewhere.)  The
// decompression algorithm is as follows: Read a byte.  The lower five
// bits are the value to repeat.  If the upper three bits are zero, the
// repeat count is in the next byte; otherwise, the upper three bits are
// the repeat count.  The tables are zero-terminated so that I can use
// string functions on them (a zero byte can never appear in the RLE
// data).

// Each array actually contains three Huffman tables, one for each sample
// type (e.g. Y,U,V, or R-G,G,B-G).  Each table expands to 256 bytes, and
// each byte is a code length.  The codes themselves are determined at
// run time and are not stored here--except for the "classic" tables,
// which don't fit the code-choosing algorithm.

When the 3 tables are available (either extracted and decompressed, or known 
because the file was compressed with v1.x of the codec) you can extract data 
from the encoded byte stream.

Tables are stored in file in this order:
YUV: Y_table, U_table, V_table
RGB: B_table, G_table, R_table
RGB with decorrelation: B-G_table, G_table, R-G_table



Frame decompression
===================

General info
------------
Frames are stored in avi file one per chunk.
All frames in file are marked as key frame.
You only need data from current chunk and file header to decompress a frame 
(no knowledge of previous frames or frame position in file is needed).



Operating algorithm
-------------------
HuffYUV codec compress image by predicting the value of a pixel from it's 
neighbourgs, and computes an error value (delta) by subtracting it from the 
effective pixel value. The result of this operation is then compressed with a 
Huffman algorithm, using a different table for each channel.
To decompress a frame you need to reverse this process: at first extract the 
error value from the compressed Huffman stream, then you can reconstruct the 
pixel value by computing the predictor value and adding it to the error value.
To make this process possible you need a start value (the first pixel in 
frame) and a predictor that uses only values from past pixel (already 
decompressed).
Obviously all predictors used in HuffYUV satisfy this requirement.

All pixel component values must be treated as unsigned bytes.
In all computations overflow or underflow causes a wrap-around (no 
saturation).

If decorrelation is used (RGB only), instead of compressing R, G, B 
components the codec compresses R-G, G, B-G (this gives a better compression).
To reconstruct B and R values you have to add them the value of G taken from 
the same pixel.



Bit stream
----------
The components of the image are compressed with different Huffman tables, and 
then the bit streams are interleaved.
As an example suppose to have 3 components (the values represent the 
difference between the pixel value and the predictor output).
The byte sequence [ABC][DEF][GHI][LMN] is compressed in this way:
- Compress each byte of the first triplet [ABC] using the Huffan table 
  associated with its component. You will get 3 codes of variable length. 
  Suppose you get aaaa by compression of A, bbbbbbb from B and ccc from C. 
  The output bit stream will be aaaabbbbbbccc.
- Repeat it for every triplet, adding the new bits at the end of the stream. 
  If compression of [DEF] will give dddd eee fffff, the output stream after 
  second pixel will be aaaabbbbbbcccddddeeefffff.

To decompress this sequence you keep extracting bits from the stream until 
you get a valid code (the table contains all valid codes), and then you 
decode it using the table associated with that component.
In the example above you will get the 4 bit aaaa and decode them to A. After 
decoding the current component you need to remove decoded bits from the 
stream (it will become bbbbbbcccddddeeefffff), and start decoding the next 
component. When the first triplet [ABC] si decoded the stream will be 
ddddeeefffff, ant you're ready to start again with the first component of the
next pixel.



Data order
----------
This is how data is stored into encoded stream.
The meaning of the stream format line is: a capital letter is used to 
indicate an uncompressed value 1 byte long, small letters represent Huffman 
encoded values and their length in bit depend on the size of the code word 
(after decompression all delta values are 1byte long).

YUV case
Image is stored left to right, top to bottom. Each pair of pixel is 
represented by 4 bytes, using YUY2 format. Frame width is always even.
Stream format: Y U Y V y u y v y u y v ....
where YUYV are the first two pixels, and y, u, v are the error values.
y is decompressed with Y_table, u with U_table and v with V_table.

RGB case
Image is stored left to right, bottom to top. Each pixel is represented by 3 
bytes, using BGR24 format. 
Stream format: X B G R b g r b g r ....
where X is unused, BGR is the first pixel, and b, g, r are the error values.
b is decompressed with B_table, g with G_table and r with R_table.

RGB with decorrelation case
Image is stored left to right, bottom to top. Each pixel is represented by 3 
bytes, using BGR24 format.
Stream format: X B G R g bg rg g bg rg ....
where X is unused, BGR is the first pixel, and g, bg, rg are the error values.
g is decompressed with G_table, bg with B-G_table and rg with R-G_table.

RGBA case
Image is stored left to right, bottom to top. Each pixel is represented by 4 
bytes, using BGR24 format plus alpha channel.
Stream format: B G R A b g r a b g r a ....
where BGRA is the first pixel, and b, g, r, a are the error values.
b is decompressed with B_table, g with G_table, r with R_table and a with 
R_table.

RGBA with decorrelation case
Image is stored left to right, bottom to top. Each pixel is represented by 4 
bytes, using BGR24 format plus alpha channel.
Stream format: B G R A g bg rg a g bg rg a ....
where BGRA is the first pixel, and g, bg, rg, a are the error values.
g is decompressed with G_table, bg with B-G_table, br with R-G_table and a 
with R-G_table.



Predictors
==========
Predictors are used to guess the value of the next pixel, so the codec needs 
to store only the error between the real pixel value and the prediction. If 
predictor is good the error will be small, and will compress better than 
pixel values.

Valid predictors are:
YUV: old, left, gradient, median.
RGB: old, left, left with decorrelation, gradient with decorrelation.

Other combinations (es. RGB gradient without decorrelation), while 
technically feasible, are not used.

While describing predictors, the following symbols are used:
[A B C]: components of a pixel
Ayz: component A of pixel at row y and column z
a, b, c: indexes used for columns
n, m: indexes used for rows
x: index used to indicate the last column of a frame



Predict left, predict left with decorrelation and predict old
-------------------------------------------------------------
Predict old is the same as predict left (without decorrelation for RGB case).

The predictor for a pixel component is the same component from previous pixel 
(previous pair for U, V in YUV case).

Some examples:
- random pixel inside a row, RGB
  ...... [Ban Gan Ran] [Bam Gam Ram] ......
  The predictor for Gam is Gan

- random pixel inside a row, YUV
  .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] .......
  The predictor for Uam is Uan, for Y1am is Y2an, for Y2am is Y1am

-first pixel in a row (not for first row), RGB
 [Ba1 Ga1 Ra1] ........ [Bax Gax Rax]
 [Bb1 Gb1 Rb1] ......
 The predictor for Rb1 is Rax

-first pixel in a row (not for first row), YUV
 [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax]
 [Y1b1 Ub1 Y2b1 Vb1] ......
 The predictor for Ub1 is Uax, for Y2b1 is Y1b1, for Y1b1 is Y2ax


The first pixel of the picture is stored uncompressed, the rest of the frame 
is compressed with predict left algorithm.



Predict gradient and predict gradient with decorrelation
--------------------------------------------------------
The predictor for a pixel is given by the sum of the previous pixel (like 
predict left) and the above pixel (the pixel from the row above at the same 
column), minus the above left pixel (the pixel from the row above at the 
previous column).

Some examples:
- random pixel inside a row, RGB
  ...... [Ban Gan Ran] [Bam Gam Ram] ......
  .......[Bbn Gbn Rbn] [Bbm Gbm Rbm]
  The predictor for Gbm is (Gbn + Gam - Gan)

- random pixel inside a row, YUV
  .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] .......
  .......[Y1bn Ubn Y2bn Vbn] [Y1bm Ubm Y2bm Vbm] .......
  The predictor for Ubm is (Ubn + Uam - Uan)
  The predictor for Y1bm is (Y2bn + Y1am - Y2an)
  The predictor for Y2bm is (Y1bm + Y2am - Y1am)

-first pixel in a row (not for first row), RGB
 [Ba1 Ga1 R11] ........ [Bax Gax Rax]
 [Bb1 Gb1 Rb1] ........ [Bbx Gbx Rbx]
 [Bc1 Gc1 Rc1] ......
 The predictor for Rc1 is (Rbx + Rb1 - Rax)

-first pixel in a row (not for first row), YUV
 [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax]
 [Y1b1 Ub1 Y2b1 Vb1] ........ [Y1bx Ubx Y2bx Vbx]
 [Y1c1 Uc1 Y2c1 Vc1] ......
 The predictor for Uc1 is (Ubx + Ub1 - Uax)
 The predictor for Y2c1 is (Y1c1 + Y2b1 - Y1b1)
 The predictor for Y1c1 is (Y2bx + Y1b1 - Y2ax)


The first pixel of the picture is stored uncompressed, the remaining of the 
first row is compressed with predict left algorithm. Other rows are 
compressed with predict gradient.
The above left value for the first pixel of the second row is 0.

Maybe other pixels of the second row should ignore above left values, but 
I'm not sure about it.




Predict median (YUV only)
--------------------------------------------------------
The predictor for a pixel is given by the median value among the left pixel, 
the pixel above and the gradient predictor (sum of the previous pixel and the 
above pixel, minus the above left pixel). The median is obtained by sorting 
the three values and taking the one in the middle, and have nothing to do 
with the mean value of the three numbers. As an example if the left pixel 
is 0x12, the above one is 0x4e and the gradient predictor is 0xaf, the median 
is 0x4e.

As Ben Rudiak-Gould says on his page, there is no technical reason for this 
method to be YUV only, he simply didn't implement it for RGB case.

Some examples:
- random pixel inside a row, YUV
  .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] .......
  .......[Y1bn Ubn Y2bn Vbn] [Y1bm Ubm Y2bm Vbm] .......
  The predictor for Ubm is the median value of Ubn, Uam and (Ubn + Uam - Uan)
  The predictor for Y1bm is the median value of Y2bn, Y1am and 
  (Y2bn + Y1am - Y2an)
  The predictor for Y2bm is the median value of Y1bm, Y2am and 
  (Y1bm + Y2am - Y1am)

-first pixel in a row (not for first row), YUV
 [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax]
 [Y1b1 Ub1 Y2b1 Vb1] ........ [Y1bx Ubx Y2bx Vbx]
 [Y1c1 Uc1 Y2c1 Vc1] ......
 The predictor for Uc1 is the median value of Ubx, Ub1 and (Ubx + Ub1 - Uax)
 The predictor for Y2c1 is the median value of Y1c1, Y2b1 and 
 (Y1c1 + Y2b1 - Y1b1)
 The predictor for Y1c1 is the median value of Y2bx, Y1b1 and 
 (Y2bx + Y1b1 - Y2ax)


The first pixel of the picture is stored uncompressed. The remaining of the 
first row, and the first 4 pixel (2 pairs, 8 bytes) of the second row are 
compressed with predict left algorithm.
The rest of the second row and other rows are compressed with predict median.

Technically the second pixel of the second row could be compressed using 
median predictor; it's compressed with left predictor because original code 
uses MMX instructions and so handles 8 bytes a time.



Interlaced images
=================
If image height is greater than 288 pixel, image is treated as interlaced.
You have to build an image with double width and half height, obtaindeb by 
placing the two fields side by side, with odd field on the left.
The frame size reported in the file header is unchanged, the only way to find 
out interlacing is looking at image height and compare with 288.
As an example, suppose your original frame is

1111111111111111
2222222222222222
3333333333333333
4444444444444444
5555555555555555
6666666666666666
7777777777777777
8888888888888888

then the frame to be compressed will be

11111111111111112222222222222222
33333333333333334444444444444444
55555555555555556666666666666666
77777777777777778888888888888888

In this way every line on a field gets the values for the predictors from the 
previous line of the same field, instead of using the previous line of the 
frame.

Please note that it makes ad ifference only for prediction methods that uses 
values from the previous line (median and gradient algorithms): if a method 
refers only to the value of the previous pixel (left algorithms) interlacing 
can be ignored.



Final notes
===========
Frame width must be divisible by 4 (at least for the original codec).

Probably header parsing logic can be simplified a lot, since not every 
combination of options is valid. This is what I was able to create using dll 
v2.1.1 and 1.2.3/1.3.1 and VirtualDub under Windows:
- Old format (v1.x only) is only used with classical tables (not stored in 
  file)
- Other methods (v2.1.1) always store tables in file, and uses extradata to 
  store method.

The first two(?) pixels (or 4 bytes) of the second row in predict gradient 
could be stored in a different way, using only left and above pixel as 
predictor. None of my samples require this.




References
==========
HuffYUV home page
Huffyuv v2.1.1, by Ben Rudiak-Gould.
http://www.math.berkeley.edu/~benrg/huffyuv.html



ChangeLog
=========
v0.8: March 24, 2003
- licensed under GNU Free Documentation License

v0.7: June 16, 2002
- Added a section about interlaced images

v0.6: March 27, 2002
- Fixed typos, added some clarification on symbol used
- Added a small explanation of Huffman decompression

v0.5: March 25, 2002
- initial release


GNU Free Documentation License
==============================
see http://www.gnu.org/licenses/fdl.html


EOF

from：
http://multimedia.cx/huffyuv.txt