目录
Huffman编码:
Huffman编码来自哈夫曼树(即最优二叉树),带权路径长度最小的二叉树,经常应用于数据压缩。出现频率(概率)高的字符码长就短,出现频率(概率)低的字符码长就长。且Huffman编码是异前置码字,即任一码字不会是另一码字的前面部分,这使各码字可以连在一起传送,中间不需另加隔离符号,只要传送时不出错,收端仍可分离各个码字,不致混淆。
题目:
这是一到关于Huffman编码解码的题目:
ciphertext:170310021e0b065f0144005a13014707525055560b065b011342035b525e1306550601065418
ciphertext[x] = (flag[x] ^ keyword[x]).encode('hex')
enckeyword = 1100011010101111010000111111111100011010110011100110101000011000101101101110001000111111011100110101001010
keyword tip:Huffman Coding
q--6
o--7
b--4
c--5
a--3
e--10
h--1
w--2
ciphertext[x] = (flag[x] ^ keyword[x]).encode('hex')
则有ciphertext[x].decode('hex')=flag[x]^keyword[x]
flag[x]=ciphertext[x].decode('hex')^keyword[x]
解题:
import binascii
dic = {'q': '001', 'o': '11', 'b': '101', 'c': '100', 'a': '0001', 'e': '01', 'h': '00001', 'w': '00000'}
Ctext ='1100011010101111010000111111111100011010110011100110101000011000101101101110001000111111011100110101001010'
def get_key(val):
for key, value in dic.items():
if val == value:
return key
return "There is no such Key"
i , j = 0 , 0
Mtext = ''
while j < len(Ctext):
if Ctext[i:j] in dic.values():
Ch=get_key(Ctext[i:j])
# print(Mtext)
Mtext += Ch
i=j
j+=1
else:
j+=1
print (Mtext)
ciphertextHex='170310021e0b065f0144005a13014707525055560b065b011342035b525e1306550601065418'
hex =ciphertextHex.encode('utf-8')
print(hex)
ciphertextBin=binascii.unhexlify(hex)
print(ciphertextBin)
ciphertext=ciphertextBin.decode('utf-8')
print (ciphertext)
print(len(Mtext))
print(len(ciphertext))
flag =''
for i in range (len(ciphertext)):
flag+= chr (ord(ciphertext[i]) ^ ord(Mtext[i]))
print (flag)
解码解出来:keyword=‘oabeeobhoooocebecocoeehceebboaaooboqbeqe’
flag是:xbrg{dd7n+o5pd%b1?69nc3bv'a9=?ri:dnw6}(感觉不符合预期)
疑问:
对于相同的字符出现频率的情况,Huffman编码不唯一,这就导致Huffman编码解码的时候得到的结果不唯一。
如何能保证这种唯一性,希望有大佬可以解答