Data Structure Lecture Note (Week 3, Lecture 8)

最新推荐文章于 2024-01-13 21:53:28 发布

ZJ_Frank

最新推荐文章于 2024-01-13 21:53:28 发布

阅读量257

点赞数

分类专栏：数据结构与算法文章标签： huffman tree 数据结构

本文链接：https://blog.csdn.net/ZJ_11701/article/details/106922728

版权

数据结构与算法专栏收录该内容

28 篇文章 0 订阅

订阅专栏

Expressions and trees. [How can we convert a string into a tree? ]

For infix expression: find the lowest priority[^recall the brackets matching problem] operation then split. Remove the bracket if (…)
For prefix expression?
- Design idea: recursive; if see an op, try to do the recursion, left tree first then right; if see a number, stop and return; how to do the recursion – find the repeating structure and subproblem
- Implementation:
[Code Here]
```
TreeNode * prefix_to_tree(string prefix);
```
(implementation of infix2tree will be longer but easier. prefix2tree will be shorter to implement)

Evaluate an expression tree:

Traversal:
- pre-order: visit the current node; do recursion on node’s left node; do recursion on node’s right node
- in-order: do recursion on left node; visit the current node; do recursion on right node
- post-oder: do recursion on left node; do recursion on right node; visit the current node (this is like divide and conquer style)

Running time is linear.

Use array to store a complete binary tree

tree_node A[1 0 2 4]

Suppose you are at node with index j, then its left child’s index is 2j+1, right is 2j+2. (Just draw a graph to see the pattern)

Dictionary ADT

unque keys with some values

one may bind a value to a key

delete a key

looup for a value by the key

So how does dictionary get implemented?

Hashtable, Tree, More sophisticated trees

dictionary by binary search tree (BST)

template<class Key, class Data>
struct BST{
	BST<Key, Data> *left, *right;
	Key key;
	Data data;
};

This is a special tree that for each node n:

n.left->key <= n.key <= n.right->key

The entire left subtree must be of smaller key value than entire right subtree

The ordering can be any type of total ordering such as string comparison or numerical.

Operations:

Find. access key = K

start from the root

compare K with the node’s key:

if K > key, go right

if K < key, go left

if K == key, stop

Build up the tree from empty

for input sequence S of size N pair

Put S[0] at the root, let j = 1
if s[j] <= root, put it at the left, and do recursion if left node is not empty
if s[j] > root, put it at the right, and do recursion if right node is not empty

Deletion?

Remove the node ad move up either left or right?

How about 4’s left and right child …

Will be discussed in week 7

If we follw this naive construction, worst case time complexity: search O(N)

We hope the tree is “balanced”

the mimimum possible height of the binary tree for N elements: $\Omega(\log N)$

Such tree is called balanced tree, and insert, delete, find in such binary search cost O(log N) in worst case.

They are not trivial to construct, will be talked in week 7

Huffman Coding, Compression Algorithms

Compression problem:

input is a text of N letters (words, spaces …)

Output is an encoding of the letters so that the size of text is smaller in binary

Encoding: a mapping from letters to 01 sequences

e.g. text = “aaaaaaaaa”

How to encode? encode a = 0, output = “00000…” and 8 zeros is one byte as compared to one letter a is a byte, so the compression retion is 8

But what if text contains more than 2 letters?

Good encoding: the encoding can actually be decoded without ambiguity

Prefix (free) encoding

Encoding E such taht for any letter A, B, E(A) is not a prefix of E(B)

letters = {‘a’, ‘b’, ‘c’}, F(‘a’) = 0, F(‘b’) = 11, F(‘c’) = 10

use tree to encode and decode

The tree will have all the letters at the leaves

internal nodes are routers

each edge is labled with 0 and 1, the encode of letter is the sequence of labels that leads to the leaf

These tree encoding are prefix free code

To decode, you can also just follow the path on the tree, the decoding ends at a leaf.

A naïve tree to build

almost full binary tree

the height is log(size of set of letters in the text)

each letter is almost having the same length of encoding (almost balanced tree)

Let’s say we have letters = {‘a’, ‘b’ , ‘c’, ‘d’, ‘e’}

Not optimal! different letters might have different frequencies

Huffman algorithm to build the tree smartly

Calculate the frequency of the letters

then build a prefix tree …

Given an alphabet A = {a1, a2, …, an}, and frequency distribution f(ai), find a binary prefix code C for A such that the number of total bits are minimized:
$\sum_{i=1}^n f(a_i L(C(a_i)))$

L is the function that caculates the length of a code.

Idea

we try to build a tree, with leaves being the alphabet

We try to minimize the weighted depth of each leaves, weights are the frequency.

Step 1: Pick two letters x, y from alphabet A with the smallest frequencies and create a subtree that has these two characters as leaves. (greedy idea) Label the root of this subtree as z. We always put the tree with smaller frequency on the left, and if the frequency are the same, we put the tree having fewer letter on the left.[^This is just to make sure the uniqueness]

Step 2: Set frequency f(z) = f(x) + f(y). Remove x, y and add z creating new alphabet A0 = A ∪ {z} − {x, y}. Note that |A0 | = |A| − 1.

Repeat this procedure, called merge, with new alphabet A0 until an alphabet with only one symbol is left. The resulting tree is the Huffman code.

In fact, Huffman code is optimal!!!