In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i] is the i-th character and code[i] is a string of '0's and '1's.
Output Specification:
For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.
Sample Input:7 A 1 B 1 C 1 D 3 E 3 F 6 G 6 4 A 00000 B 00001 C 0001 D 001 E 01 F 10 G 11 A 01010 B 01011 C 0100 D 011 E 10 F 11 G 00 A 000 B 001 C 010 D 011 E 100 F 101 G 110 A 00000 B 00001 C 0001 D 001 E 00 F 10 G 11Sample Output:
Yes Yes No No
思路:
判断哈夫曼编码的条件有两个:
1 哈夫曼编码不唯一,但它的WPL(带权路径长度)一定唯一
2 短码不能是长码的前缀
首先可以使用STL优先队列 根据 WPL=所有非叶节点的权值之和 求出标准的WPL1
再根据WPL2=所有叶节点的高度*权值之和
再单独判断是否编码中构成前缀
两个条件都满足则输出Yes
代码:
import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.PriorityQueue; import java.util.Scanner; import java.util.TreeSet; class Tree { private Node root; public Node getRoot() { return root; } public void setRoot(Node root) { this.root = root; } static class Node implements Comparable<Node> { private char c=' ';//字母 private int f =0;// 出现的频率 private Node parent ;//父节点 private Node leftNode ;//左子节点 private Node rightNode ;//右子节点 @Override public int compareTo(Node o) { // TODO Auto-generated method stub return f-o.f; } public char getC() { return c; } public void setC(char c) { this.c = c; } public int getF() { return f; } public void setF(int f) { this.f = f; } public Node getParent() { return parent; } public void setParent(Node parent) { this.parent = parent; } public Node getLeftNode() { return leftNode; } public void setLeftNode(Node leftNode) { this.leftNode = leftNode; } public Node getRightNode() { return rightNode; } public void setRightNode(Node rightNode) { this.rightNode = rightNode; } @Override public String toString() { return "Node [c=" + c + ", f=" + f + "]"; } } /*构造huffman树 * 返回huffman的带全路径长度=所有非叶子节点之和 * */ public static int encode(PriorityQueue<Node> queue) { int WPL=0; //取出queue中,优先级最高的两个Node int num= queue.size()-1; for(int i=0;i<num;i++) { //两个node的优先级相加,最为一个新的Node加入到queue中 Node left= queue.poll(); Node right = queue.poll(); Node newNode = new Node(); newNode.setF(left.f+right.f); WPL+=left.f+right.f; queue.add(newNode); //将两个node节点分别作为新节点的左右节点 newNode.leftNode=left; newNode.rightNode=right; //设置新节点为左右节点的父节点 left.parent=newNode; right.parent=newNode; } //queue中的最后一个节点就是根节点 return WPL; } //宽度优先遍历树 public void printTreeBFS() { LinkedList<Node> queue = new LinkedList<Node>(); if(root!=null) { queue.add(root); while(!queue.isEmpty()) { Node node = queue.poll(); System.out.print(node+":"); if(node.leftNode!=null) queue.add(node.leftNode); if(node.rightNode!=null) queue.add(node.rightNode); } } } public static boolean jude(HashMap<Character,String> hashMap) { //对hashMap进行排序 List<Map.Entry<Character, String>> infoIds = new ArrayList<Map.Entry<Character, String>>(hashMap.entrySet()); Collections.sort(infoIds, new Comparator<Map.Entry<Character, String>>() { public int compare(Map.Entry<Character, String> o1, Map.Entry<Character, String> o2) { //return (o2.getValue() - o1.getValue()); return o1.getValue().length()-o2.getValue().length(); } }); for (int i = 0; i < infoIds.size(); i++) { String code = infoIds.get(i).getValue(); for(int j=i+1;j<infoIds.size();j++) { String nextcode = infoIds.get(j).getValue(); if(nextcode.startsWith(code)) { return false; } } } return true; } } public class Main{ public static void main(String[] args){ Scanner scanner = new Scanner(System.in); //使用PriorityQueue保存节点 PriorityQueue<Tree.Node> queue = new PriorityQueue<Tree.Node>(); //输入一个整数N int n= scanner.nextInt(); //保存字母 i出现的频率 int[] help = new int[130]; for(int i=0 ;i<n ;i++) { char c= scanner.next().charAt(0); int f= scanner.nextInt(); help[c]=f; Tree.Node node = new Tree.Node(); node.setC(c); node.setF(f); queue.add(node); } //构造Huffman树 int WPL = Tree.encode(queue); //输入m int M= scanner.nextInt(); for(int i=0 ;i<M;++i) { //一组输入的WPL int WPL2=0; //保存一组输入 HashMap<Character,String> hashMap = new HashMap<Character,String>(); for(int j=0;j<n;++j) { char c = scanner.next().charAt(0); String code = scanner.next(); WPL2 +=help[c]*code.length(); hashMap.put(c, code); } if(WPL==WPL2) { //判断是否存在短编码是长编码的前缀 if(Tree.jude(hashMap)) System.out.println("Yes"); else System.out.println("No"); } else { System.out.println("No"); } } } }
结果正确 但是有一个测试点会超时,c++版本
参考资料:http://blog.csdn.net/AXuan_K/article/details/45583335
http://shmilyaw-hotmail-com.iteye.com/blog/2009929