Huffman Code Algorithm
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters.
Huffman Tree should not be qouted along, because it just a middle step to solve the compression problem.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.
- Create a leaf node for each unique character and build a min heap of
all leaf nodes (Min Heap is used as a priority queue. The value of
frequency field is used to compare two nodes in min heap. Initially,
the least frequent character is at root) - Extract two nodes with the minimum frequency from the min heap.
- Create a new internal node with a frequency equal to the sum of the
two nodes frequencies. Make the first extracted node as its left
child and the other extracted node as its right child. Add this node
to the min heap. - Repeat steps#2 and #3 until the heap contains only one node. The
remaining node is the root node and the tree is complete.
Why use minimum heap to construct Huffman Tree?
A Huffman Tree is a minimum heap clearly.
Leecode example:
小明算数不太好,如果他计算a+b需要花费(a+b)的时间,现在有n个数字,让小明去求这个n个数字的和,求小明计算这n个数字所花费的最小的时间
input:
3
1 2 3
output:
9
Golang Code
package main
import (
"container/heap"
"fmt"
)
type TreeNode struct {
Val int
Left *TreeNode
Right *TreeNode
}
type treeHeap []*TreeNode
func (h *treeHeap) Less(i, j int) bool {
return (*h)[i].Val < (*h)[j].Val
}
func (h *treeHeap) Swap(i, j int) {
(*h)[i], (*h)[j] = (*h)[j], (*h)[i]
}
func (h *treeHeap) Len() int {
return len(*h)
}
func (h *treeHeap) Push(node interface{}) {
*h = append(*h, node.(*TreeNode))
}
func (h *treeHeap) Pop() interface{} {
n := len(*h)
x := (*h)[n-1]
*h = (*h)[0 : n-1]
return x
}
func main() {
var n int
fmt.Scan(&n)
var nums = make([]int, n)
for i := 0; i < len(nums); i++ {
fmt.Scan(&nums[i])
}
fmt.Println(getSumResult(nums))
}
func getSumResult(arr []int) int {
th := treeHeap{}
for i := 0; i < len(arr); i++ {
th = append(th, &TreeNode{
Val: arr[i],
Left: nil,
Right: nil,
})
}
for th.Len() > 1 {
// init heap
heap.Init(&th)
t1 := heap.Pop(&th).(*TreeNode)
t2 := heap.Pop(&th).(*TreeNode)
root := &TreeNode{
Val: t1.Val + t2.Val,
Left: t1,
Right: t2,
}
if t1.Val > t2.Val {
root.Right, root.Left = t1, t2
} else {
root.Right, root.Left = t2, t1
}
heap.Push(&th, root)
}
thNode := th[0]
// pre order traverse
var sum int
preOrderTraverse(thNode, &sum)
// root node should not be considered because its weight is 0.
return sum - th[0].Val
}
func preOrderTraverse(node *TreeNode, sum *int) {
if node != nil {
preOrderTraverse(node.Left, sum)
*sum += node.Val
preOrderTraverse(node.Right, sum)
}
}
Please think it through that when to use huffman tree, and what kinds of problems can be solved by huffman tree.