怎么用python实现哈夫曼树_如何使用Python将霍夫曼编码写入文件?

I created a Python script to compress text by using the Huffman algorithm. Say I have the following string:

string = 'The quick brown fox jumps over the lazy dog'

Running my algorithm returns the following 'bits':

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'

By comparing the amount of bits of the result with the input string, the algorithm seems to work:

>>> print len(result), len(string) * 8

194 344

But now comes the question: how do I write this to a file, while still being able to decode it. You can only write to a file per byte, not per bit. By writing the 'codes' as bytes, there is no compression at all!

I am new at computer science, and the online resources just don't cut it for me. All help is much appreciated!

Edit: note that I had my codes something like this (in case of another input string 'xxxxxxxyzz'):

{'y': '00', 'x': '1', 'z': '10'}

The way I create the resulting string is by concatenating these codes in order of the input string:

result = '1111111001010'

How to get back to the original string from this result? Or am I getting this completely wrong? Thank you!

解决方案

First you need to convert your input string to bytes:

def _to_Bytes(data):

b = bytearray()

for i in range(0, len(data), 8):

b.append(int(data[i:i+8], 2))

return bytes(b)

Then, open a file to write in binary mode:

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'

with open('test.bin', 'wb') as f:

f.write(_to_Bytes(result))

Now, writing the original string to a file, a comparison of bytes can take place:

import os

with open('test_compare.txt', 'a') as f:

f.write('The quick brown fox jumps over the lazy dog')

_o = os.path.getsize('test_compare.txt')

_c = os.path.getsize('test.bin')

print(f'Original file: {_o} bytes')

print(f'Compressed file: {_c} bytes')

print('Compressed file to about {}% of original'.format(round((((_o-_c)/_o)*100), 0)))

Output:

Original file: 43 bytes

Compressed file: 25 bytes

Compressed file to about 42.0% of original

To get back to the original, you can write a function that determines the possible ordering of characters:

d = {'y': '00', 'x': '1', 'z': '10'}

result = '1111111001010'

from typing import Generator

def reverse_encoding(content:str, _lookup) -> Generator[str, None, None]:

while content:

_options = [i for i in _lookup if content.startswith(i) and (any(content[len(i):].startswith(b) for b in _lookup) or not content[len(i):])]

if not _options:

raise Exception("Decoding error")

yield _lookup[_options[0]]

content = content[len(_options[0]):]

print(''.join(reverse_encoding(result, {b:a for a, b in d.items()})))

Output:

'xxxxxxxyzz'

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值