数据压缩或熵编码中的alphabet size以及order-N adaptive context

An order-N adaptive context-based modeler reads the next symbol S from the input stream and considers the N symbols preceding S the current order-N context C of S.The model then estimates the probability P that S appears in the input data following the particular context C. Theoretically, the larger N, the better the probability estimate(the prediction).

基于N阶自适应上下文的建模器从输入流中读取下一个符号S,并将S之前的N个符号视为S的当前N阶上下文C。

If our symbols are the 7-bit ASCII codes, the alphabet size is 2^7= 128 symbols. There are therefore128^2=16,384 order-2 contexts, 128^3=2,097,152 order-3 contexts, and so on. Number of contexts grows exponentially ,A^N,where A is the alphabet size.

Alphabet:The set of all possible symbols in the input stream. In text compression, thealphabet is normally the set of 128 ASCII codes. In image compression it is the set ofvalues a pixel can take (2, 16, 256, or anything else). (See also Symbol.)

Context: 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个简单的使用香农编码对图像进行压缩的C语言示例: ```c #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_SYMBOLS 256 typedef struct { unsigned char symbol; int frequency; } Symbol; typedef struct { long long low; long long high; } Range; typedef struct { Symbol symbol_list[MAX_SYMBOLS]; int symbol_count; } FrequencyTable; void init_frequency_table(FrequencyTable* table) { memset(table, 0, sizeof(FrequencyTable)); } void update_frequency_table(FrequencyTable* table, unsigned char* data, int size) { for (int i = 0; i < size; i++) { int found = 0; for (int j = 0; j < table->symbol_count; j++) { if (table->symbol_list[j].symbol == data[i]) { table->symbol_list[j].frequency++; found = 1; break; } } if (!found) { Symbol symbol = { data[i], 1 }; table->symbol_list[table->symbol_count++] = symbol; } } } void sort_frequency_table(FrequencyTable* table) { for (int i = 0; i < table->symbol_count - 1; i++) { for (int j = i + 1; j < table->symbol_count; j++) { if (table->symbol_list[i].frequency < table->symbol_list[j].frequency) { Symbol temp = table->symbol_list[i]; table->symbol_list[i] = table->symbol_list[j]; table->symbol_list[j] = temp; } } } } void build_range_table(FrequencyTable* table, Range* range_list) { long long total_frequency = 0; for (int i = 0; i < table->symbol_count; i++) { total_frequency += table->symbol_list[i].frequency; } long long low = 0; for (int i = 0; i < table->symbol_count; i++) { long long high = low + table->symbol_list[i].frequency * (1LL << 32) / total_frequency; range_list[i].low = low; range_list[i].high = high; low = high; } } void encode_data(unsigned char* data, int size, Range* range_list, FILE* output) { long long low = 0; long long high = (1LL << 32) - 1; for (int i = 0; i < size; i++) { int symbol_index = -1; for (int j = 0; j < MAX_SYMBOLS; j++) { if (range_list[j].low <= low && high < range_list[j].high) { symbol_index = j; break; } } if (symbol_index < 0) { fprintf(stderr, "Error: symbol not found in range table.\n"); exit(1); } Range range = range_list[symbol_index]; long long range_size = high - low + 1; high = low + range_size * range.high / (1LL << 32) - 1; low = low + range_size * range.low / (1LL << 32); while (1) { if ((low ^ high) < (1LL << 31)) { fputc(low >> 31, output); while (fputc(1, output), high_bits > 0) { high_bits--; } low = (low << 1) & ((1LL << 32) - 1); high = ((high << 1) | 1) & ((1LL << 32) - 1); } else if ((low >> 30) == 1 && (high >> 30) == 0) { high_bits++; low = (low << 1) & ((1LL << 32) - 1); high = ((high << 1) | 1) & ((1LL << 32) - 1); } else { break; } } } for (int i = 0; i < 8; i++) { fputc(low >> 31, output); } } void encode_image(char* input_filename, char* output_filename) { FILE* input = fopen(input_filename, "rb"); if (!input) { fprintf(stderr, "Error: failed to open input file.\n"); exit(1); } FILE* output = fopen(output_filename, "wb"); if (!output) { fprintf(stderr, "Error: failed to open output file.\n"); exit(1); } unsigned char buffer[1024]; FrequencyTable table; Range range_list[MAX_SYMBOLS]; init_frequency_table(&table); while (!feof(input)) { int size = fread(buffer, 1, sizeof(buffer), input); update_frequency_table(&table, buffer, size); } sort_frequency_table(&table); build_range_table(&table, range_list); fwrite(&table.symbol_count, sizeof(int), 1, output); for (int i = 0; i < table.symbol_count; i++) { fwrite(&table.symbol_list[i].symbol, sizeof(unsigned char), 1, output); fwrite(&table.symbol_list[i].frequency, sizeof(int), 1, output); } rewind(input); while (!feof(input)) { int size = fread(buffer, 1, sizeof(buffer), input); encode_data(buffer, size, range_list, output); } fclose(input); fclose(output); } int main(int argc, char** argv) { if (argc != 3) { fprintf(stderr, "Usage: %s input_file output_file\n", argv[0]); return 1; } encode_image(argv[1], argv[2]); return 0; } ``` 这个示例程序将输入的图像文件经过香农编码压缩后输出到指定的文件。具体来说,它先对输入数据各个符号的出现频率进行统计,并按照频率从高到低排序,然后根据符号出现频率构建出每个符号的编码范围,最后使用编码范围对输入数据进行编码,并将编码后的数据输出到文件。在解压缩时,只需要使用相同的频率表和编码范围即可还原原始数据

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值