1.代码:今日的主要任务是定义哈夫曼树相关参数,并实现存储字符的文件的读取。
package datastructure.tree;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.stream.Collectors;
public class Huffman {
/*
* An inner class for Huffman nodes.
*/
class HuffmanNode {
/*
* The char. Only valid for leaf nodes.
*/
char character;
/*
* Weight. It can also be double.
*/
int weight;
/*
* The left child.
*/
HuffmanNode leftChild;
/*
* The right child.
*/
HuffmanNode rightChild;
/*
* The parent. It helps constructing the Huffman code of each character.
*/
HuffmanNode parent;
/**
* ******************** The first constructor ********************
*/
public HuffmanNode(char paraCharacter, int paraWeight, HuffmanNode paraLeftChild, HuffmanNode paraRightChild,
HuffmanNode paraParent) {
character = paraCharacter;
weight = paraWeight;
leftChild = paraLeftChild;
rightChild = paraRightChild;
parent = paraParent;
}// Of HuffmanNode
/**
* ******************** To string. ********************
*/
public String toString() {
String resultString = "(" + character + ", " + weight + ")";
return resultString;
}// of toString
}// Of class HuffmanNode
/*
* The number of characters. 256 for ASCII.
*/
public static final int NUM_CHARS = 256;
/*
* The input text. It is stored in a string for simplicity.
*/
String inputText;
/*
* The length of the alphabet, also the number of leaves.
*/
int alphabetLength;
/*
* The alphabet.
*/
char[] alphabet;
/*
* The count of chars. The length is 2 * alphabetLength - 1 to include non-leaf
* nodes.
*/
int[] charCounts;
/*
* The mapping of chars to the indices in the alphabet.
*/
int[] charMapping;
/*
* Codes for each char in the alphabet. It should have the same length as
* alphabet.
*/
String[] huffmanCodes;
/*
* All nodes. The last node is the root.
*/
HuffmanNode[] nodes;
/**
*********************
* The first constructor.
*
* @param paraFilename The text filename.
*********************
*/
public Huffman(String paraFilename) {
charMapping = new int[NUM_CHARS];
readText(paraFilename);
}// Of the first constructor
/**
*********************
* Read text.
*
* @param paraFilename The text filename.
*********************
*/
public void readText(String paraFilename) {
try {
inputText = Files.newBufferedReader(Paths.get(paraFilename), StandardCharsets.UTF_8).lines()
.collect(Collectors.joining("\n"));
} catch (Exception ee) {
System.out.println(ee);
System.exit(0);
} // Of try
System.out.println("The text is: \r\n" + inputText);
}// Of readText
/**
*********************
* The entrance of the program.
*
* @param args Not used now.
*********************
*/
public static void main(String args[]) {
Huffman tempHuffman = new Huffman("C:/Users/01/Desktop/huffman.txt");
}// Of main
}// Of class Huffman
2.运行结果:
3.总结:
a.哈夫曼编码(Huffman Coding),又称霍夫曼编码,是一种编码方式,可变字长编码(VLC)的一种。Huffman于1952年提出一种编码方法,该方法完全依据字符出现概率来构造异字头的平均长度最短的码字,有时称之为最佳编码,一般就叫做Huffman编码(有时也称为霍夫曼编码)。
b.哈夫曼树并不唯一,但带权路径长度一定是相同的。
c.哈夫曼树构造:
1.根据给定的n个权值{w1, w2, w3 ... wn },构造n棵只有根节点的二叉树,令起权值为wj
2.在森林中选取两棵根节点权值最小的树作为左右子树,构造一颗新的二叉树,置新二叉树根节点权值为其左右子树根节点权值之和。
3.从森林中删除这两棵树,同时将新得到的二叉树加入森林中.(换句话说,之前的2棵最小的根节点已经被合并成一个新的结点了)
4.重复上述两步,直到只含一棵树为止,这棵树即是哈弗曼树