1.建立Huffman树
1.1 Huffman树构建思想
Huffman树为了使得构建出来的树的带权路径长度最小,所以才采取自底向上的构建方式,让权值越大的越靠近根。这里以Huffman二叉树为例,首先从节点中选择权值最小的两个节点,生成一个父节点然后连接起来(一般小的做左孩子大的做右孩子),更新父节点的权值,以此下去直到所有节点都被链接。
1.2 字母表构建
- 初始化charMapping,建立tempCharCounts,遍历输入字符串,根据对应字符的ASCII码修改tempCharCounts中对应下标位置的值;
- 再通过遍历tempCharCounts,计算出非零值个数以更新alphabetlength;
- 根据alphabetlength建立alphabet、charCounts,根据tempCharCounts的非零值,alphabet中增加对应ACSII码的字符,charCounts中记录该字符出现次数,charMapping中在该字符对应的ACSII大小的下标位置,记录其在alphabet中位置的下标。
public void constructAlphabet() {
// Initialize.
Arrays.fill(charMapping, -1);
// The count for each char. At most NUM_CHARS chars.
int[] tempCharCounts = new int[NUM_CHARS];
// The index of the char in the ASCII charset.
int tempCharIndex;
// Step 1. Scan the string to obtain the counts.
char tempChar;
for (int i = 0; i < inputText.length(); i++) {
tempChar = inputText.charAt(i);
tempCharIndex = (int) tempChar;
System.out.print("" + tempCharIndex + " ");
tempCharCounts[tempCharIndex]++;
} // Of for i
// Step 2. Scan to determine the size of the alphabet.
alphabetLength = 0;
for (int i = 0; i < 255; i++) {
if (tempCharCounts[i] > 0) {
alphabetLength++;
} // Of if
} // Of for i
// Step 3. Compress to the alphabet
alphabet = new char[alphabetLength];
charCounts = new int[2 * alphabetLength - 1];
int tempCounter = 0;
for (int i = 0; i < NUM_CHARS; i++) {
if (tempCharCounts[i] > 0) {
alphabet[tempCounter] = (char) i;
charCounts[tempCounter] = tempCharCounts[i];
charMapping[i] = tempCounter;
tempCounter++;
} // Of if
} // Of for i
System.out.println("The alphabet is: " + Arrays.toString(alphabet));
System.out.println("Their counts are: " + Arrays.toString(charCounts));
System.out.println("The char mappings are: " + Arrays.toString(charMapping));
}// Of constructAlphabet
Array.fill():可以在指定位置进行指定数值填充。
1.3 构建Huffman树
- 申请节点空间并建立对应的tempProcessed以记录该节点是否已被选择;
- 建立并根据alphabet和charCount初始化所有的叶子节点的字符与权值;
- 找最小权值的节点(下标)做左孩子,修改tempProcessed该下标位置的内容为true。再找此时最小权值的节点做右孩子,创建一个父节点然后链接它们。
/**
*********************
* Construct the tree.
*********************
*/
public void constructTree() {
// Step 1. Allocate space.
nodes = new HuffmanNode[alphabetLength * 2 - 1];
boolean[] tempProcessed = new boolean[alphabetLength * 2 - 1];
// Step 2. Initialize leaves.
for (int i = 0; i < alphabetLength; i++) {
nodes[i] = new HuffmanNode(alphabet[i], charCounts[i], null, null, null);
} // Of for i
// Step 3. Construct the tree.
int tempLeft, tempRight, tempMinimal;
for (int i = alphabetLength; i < 2 * alphabetLength - 1; i++) {
// Step 3.1 Select the first minimal as the left child.
tempLeft = -1;
tempMinimal = Integer.MAX_VALUE;
for (int j = 0; j < i; j++) {
if (tempProcessed[j]) {
continue;
} // Of if
if (tempMinimal > charCounts[j]) {
tempMinimal = charCounts[j];
tempLeft = j;
} // Of if
} // Of for j
tempProcessed[tempLeft] = true;
// Step 3.2 Select the second minimal as the right child.
tempRight = -1;
tempMinimal = Integer.MAX_VALUE;
for (int j = 0; j < i; j++) {
if (tempProcessed[j]) {
continue;
} // Of if
if (tempMinimal > charCounts[j]) {
tempMinimal = charCounts[j];
tempRight = j;
} // Of if
} // Of for j
tempProcessed[tempRight] = true;
System.out.println("Selecting " + tempLeft + " and " + tempRight);
// Step 3.3 Construct the new node.
charCounts[i] = charCounts[tempLeft] + charCounts[tempRight];
nodes[i] = new HuffmanNode('*', charCounts[i], nodes[tempLeft], nodes[tempRight], null);
// Step 3.4 Link with children.
nodes[tempLeft].parent = nodes[i];
nodes[tempRight].parent = nodes[i];
System.out.println("The children of " + i + " are " + tempLeft + " and " + tempRight);
} // Of for i
}// Of constructTree
1.4 求根函数
/**
*********************
* Get the root of the binary tree.
*
* @return The root.
*********************
*/
public HuffmanNode getRoot() {
return nodes[nodes.length - 1];
}// Of getRoot
2.测试主程序
public static void main(String args[]) {
Huffman tempHuffman = new Huffman("E:/postgraduate/csdn/temp/huffmantext-small.txt");
tempHuffman.constructAlphabet();
tempHuffman.constructTree();
HuffmanNode tempRoot = tempHuffman.getRoot();
System.out.println("The root is: " + tempRoot);
}// Of main
输出:
注:这里的charMapping太长,故没有截图结完,内容为出现字符对应的ASCII码的位置记录着该字符在alphabet中下标,无效值为-1。