Huffman 编码树

Huffman 编码树


SICP 练习 2.69 - 2.70

编码树的表示

树叶

首先,编码树应当有叶子结点,保存被编码的符号。从根到叶子的路径就是叶子中符号的编码。

可以用(leaf <符号> <权重>)表示树叶:

; leaf
(define (make-leaf symbol weight)
  (list 'leaf symbol weight))

(define (leaf? object)
  (eq? (car object) 'leaf))

(define (symbol-leaf x) (cadr x))
(define (weight-leaf x) (caddr x))

树根

因为给符号编码过程相当于从树根开始,找一条(唯一的一条)到该符号树叶的路径,我们准备递归地写程序,所以最好能在根节点判断出该走左边还是右边,这要求根节点持有其左子树与右子树的符号信息。

; tree
(define (make-code-tree left right)
  (list left
        right
        (extend (symbols left) (symbols right))
        (+ (weight left) (weight right))))

(define (left-branch tree) (car tree))

(define (right-branch tree) (cadr tree))

编码树的构造需要不断地合并权重最小的子树,因此树根也应该持有子树的权重。这里树叶和树根的表示有所不同,编写获取符号(集)和权重(和)的“多态函数”,会比较方便。

(define (symbols tree)
  (if (leaf? tree)
      (list (symbol-leaf tree))
      (caddr tree)))

(define (weight tree)
  (if (leaf? tree)
      (weight-leaf tree)
      (cadddr tree)))

编码过程需要查看子树的符号集,编码树的构造过程需要查看子树的权重。

解码

假如我们已经有了一棵 Huffman 树。

(define sample-tree
  (make-code-tree (make-leaf 'A 4)
                  (make-code-tree
                   (make-leaf 'B 2)
                   (make-code-tree (make-leaf 'D 1)
                                   (make-leaf 'C 1)))))

解码就是将一串01还原成一串符号的过程。是叶子,就加入到最后的结果中,否则继续解码。

(define (decode bits tree)
  (define (decode-1 bits current-branch)
    (if (null? bits)
        '()
        (let ((next-branch
               (choose-branch (car bits) current-branch)))
          (if (leaf? next-branch)
              (cons (symbol-leaf next-branch)
                    (decode-1 (cdr bits) tree))
              (decode-1 (cdr bits) next-branch)))))
  (decode-1 bits tree))

(define (choose-branch bit branch)
  (cond ((= bit 0) (left-branch branch))
        ((= bit 1) (right-branch branch))
        (else (error "bad bit -- CHOOSE-BRANCH" bit))))

构造编码树

构造的起点是一系列(<符号> <频度>)对。

(define pairs '((A 4) (C 1) (D 1) (B 2)))

按权重排序,

(define (make-leaf-set pairs)
  (if (null? pairs)
      '()
      (let ((pair (car pairs)))
        (adjoin-set (make-leaf (car pair)
                               (cadr pair))
                    (make-leaf-set (cdr pairs))))))

不停地归并,

(define (generate-huffman-tree pairs)
  (successive-merge (make-leaf-set pairs)))

(define (successive-merge leaf-set)
  (define (merge rest)
    (cond ((null? rest) '())
          ((null? (cdr rest)) (car rest))
          (else (let ((tree1 (car rest))
                      (tree2 (cadr rest)))
                  (merge
                   (adjoin-set
                    (make-code-tree tree1 tree2)
                    (cddr rest)))))))
  (merge leaf-set))

编码

一个一个地编码,

(define (encode message tree)
  (if (null? message)
      '()
      (extend (encode-symbol (car message) tree)
              (encode (cdr message) tree))))

(define (encode-symbol symbol tree)
  (define (iter tree result)
    (cond ((leaf? tree) (result '()))
          ((memq? symbol (symbols (left-branch tree)))
           (iter (left-branch tree) (lambda (x) (result (cons 0 x)))))
          ((memq? symbol (symbols (right-branch tree)))
           (iter (right-branch tree) (lambda (x) (result (cons 1 x)))))
          (else (error "bad symbol -- ENCODE-SYMBOL" symbol))))
  (iter tree (lambda (x) x)))

测试

(define sample-song
  '(GET A JOB SHA NA NA NA NA NA NA NA NA
        GET A JOB SHA NA NA NA NA NA NA NA NA
        WAH YIP YIP YIP YIP YIP YIP YIP YIP YIP
        SHA BOOM))

(define rock-pairs
  '((A 2) (NA 16) (BOOM 1) (SHA 3) (GET 2) (YIP 9) (JOB 2) (WAH 1)))

(define ht
  (generate-huffman-tree rock-pairs))

> (length (encode sample-song ht))
84

> (* 3 (length sample-song))
108

可以看到,在频率估计合理时,Huffman 编码可以比定长码短。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值