最近在做公司的一个项目中用到了PackBits Compression算法,现总结下,如有不对之处,恳请指正。
这上面就是这个算法的关键部分,用了伪代码进行了说明,我想这点大家都能看得懂,看不懂也没关系,后面我会将源代码贴出来,供大家参考,就能懂了。
大体意思是:对于一字节序列,通常循环选取3个重复字节进行处理,然后看后面的字节是否和前面的重复,重复就计数count,这个数就代表重复的个数。存储时第一个数为标志位,需要转换,计算方法为(1+256-count),第二个数存储的是重复的数据,以上是理想情况为3个重复字节的操作。当遇到不是重复的字节小于3个字节的时候,如为2个字节的时候,编码原则逐个拷贝后面的数据,并计数为count,存储时第一个数计算方法是(count-1),后面是逐个拷贝的数据。
可能上面的解释不是很清楚,希望大牛出现给个更好的解释。
以下是C源代码:来源于网络,找了好久找到的,是从一个工程中剥离出来的,包括编码和解码部分。
注释比较详细,应该能看得懂,不解释了~
首先声明一下:
1、编码
再贴出解码部分
2、解码
【完】
先引用下TIFF Revision 6.0中的中关于这个算法的解释:【TIFF Revision 6.0 Final — June 3, 1992, 42页】
A pseudo code fragment to unpack might look like this:
Loop until you get the number of unpacked bytes you are expecting:
Read the next source byte into n.
If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
Else if n is between -127 and -1 inclusive, copy the next byte -n+1
times.
Else if n is -128, noop.
Endloop
In the inverse routine, it is best to encode a 2-byte repeat run as a replicate run
except when preceded and followed by a literal run. In that case, it is best to merge
the three runs into one literal run. Always encode 3-byte repeats as replicate runs.
这上面就是这个算法的关键部分,用了伪代码进行了说明,我想这点大家都能看得懂,看不懂也没关系,后面我会将源代码贴出来,供大家参考,就能懂了。
大体意思是:对于一字节序列,通常循环选取3个重复字节进行处理,然后看后面的字节是否和前面的重复,重复就计数count,这个数就代表重复的个数。存储时第一个数为标志位,需要转换,计算方法为(1+256-count),第二个数存储的是重复的数据,以上是理想情况为3个重复字节的操作。当遇到不是重复的字节小于3个字节的时候,如为2个字节的时候,编码原则逐个拷贝后面的数据,并计数为count,存储时第一个数计算方法是(count-1),后面是逐个拷贝的数据。
可能上面的解释不是很清楚,希望大牛出现给个更好的解释。
以下是C源代码:来源于网络,找了好久找到的,是从一个工程中剥离出来的,包括编码和解码部分。
注释比较详细,应该能看得懂,不解释了~
首先声明一下:
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
1、编码
// Assuming compressor logic is maximally efficient,
// worst case input with no duplicate runs of 3 or more bytes
// will be compressed into a series of verbatim runs no longer
// than 128 bytes, each preceded by length byte.
// i.e. worst case output length is not more than 129*ceil(n/128)
// or slightly tighter, 129*floor(n/128) + 1 + (n%128)
unsigned int packbits(unsigned char *src, unsigned char *dst, unsigned int n){
unsigned char *p, *q, *run, *dataend;
int count, maxrun;
dataend = src + n;
for( run = src, q = dst; n > 0; run = p, n -= count ){
// A run cannot be longer than 128 bytes.
maxrun = n < 128 ? n : 128;
if(run <= (dataend-3) && run[1] == run[0] && run[2] == run[0]){
// 'run' points to at least three duplicated values.
// Step forward until run length limit, end of input,
// or a non matching byte:
for( p = run+3; p < (run+maxrun) && *p == run[0]; )
++p;
count = p - run;
// replace this run in output with two bytes:
*q++ = 1+256-count; /* flag byte, which encodes count (129..254) */
*q++ = run[0]; /* byte value that is duplicated */
}else{
// If the input doesn't begin with at least 3 duplicated values,
// then copy the input block, up to the run length limit,
// end of input, or until we see three duplicated values:
for( p = run; p < (run+maxrun); )
if(p <= (dataend-3) && p[1] == p[0] && p[2] == p[0])
break; // 3 bytes repeated end verbatim run
else
++p;
count = p - run;
*q++ = count-1; /* flag byte, which encodes count (0..127) */
memcpy(q, run, count); /* followed by the bytes in the run */
q += count;
}
}
return q - dst;
}
再贴出解码部分
2、解码
unsigned int unpackbits(unsigned char *outp, unsigned char *inp,
unsigned int outlen, unsigned int inlen)
{
unsigned int i, len;
int val;
/* i counts output bytes; outlen = expected output size */
for(i = 0; inlen > 1 && i < outlen;){
/* get flag byte */
len = *inp++;
--inlen;
if(len == 128) /* ignore this flag value */
; // warn_msg("RLE flag byte=128 ignored");
else{
if(len > 128){
len = 1+256-len;
/* get value to repeat */
val = *inp++;
--inlen;
if((i+len) <= outlen)
memset(outp, val, len);
else{
memset(outp, val, outlen-i); // fill enough to complete row
printf("unpacked RLE data would overflow row (run)\n");
len = 0; // effectively ignore this run, probably corrupt flag byte
}
}else{
++len;
if((i+len) <= outlen){
if(len > inlen)
break; // abort - ran out of input data
/* copy verbatim run */
memcpy(outp, inp, len);
inp += len;
inlen -= len;
}else{
memcpy(outp, inp, outlen-i); // copy enough to complete row
printf("unpacked RLE data would overflow row (copy)\n");
len = 0; // effectively ignore
}
}
outp += len;
i += len;
}
}
if(i < outlen)
printf("not enough RLE data for row\n");
return i;
}
【完】