Java赫夫曼编码与解码（含文件压缩与解压）

最新推荐文章于 2024-05-11 10:30:00 发布

一位开朗的网友

最新推荐文章于 2024-05-11 10:30:00 发布

阅读量442

点赞数 3

文章标签：数据结构 java intellij-idea 霍夫曼树

本文链接：https://blog.csdn.net/qq_62658309/article/details/126995808

版权

写在前面：

吐槽一下：变秃了，也变强了！

建议看过韩顺平老师讲的数据结构课后食用（比较快的上手我的代码罢了，也不必紧张）。

本博客重点是解决韩顺平老师视频里代码bug，其他部分（个人感觉比较容易的）可以看看别人的理解（懒得弄hh）。

问题的原因：

韩老师只考虑到了：最后一个字节数直接转，非最后一个字节数并且还是非负数时要补高位这两方面。因此，就出现很多次下标越界的问题。

我们可以举例：最后一个字节数是负数直接转没问题，一定能完美转成8为二进制字符串（细节是二进制的基础内容，这里不展开说了）。但非负数的情况就得小心了

0的情况：

比如这串我处理过生成的赫夫曼编码：001111111010101100 。8位8位存的时候我们发现最后会剩下00，
有2个0，当然不同的例子也会出现多个0.但是，字节数组只会记录0.如下：[63, -85, 0]
因此按老师方法解码出来的字串就会少一个0，因为是直接转的，如下：00111111101010110

这里我们就可以清楚为啥总报出下标越界的错误了。首先，报错的位置在decode方法那里，参考如下代码：

        //对照表解码
        List<Byte> li = new ArrayList<>();
        int len = sb.length();
        for (int i = 0; i < len; ) {
            int j = i + 1;
            while (true) {
                String t = sb.substring(i, j);//这里提示的报错
                if (m.get(t) != null) {
                    li.add(m.get(t));
                    break;
                }
                j++;
            }
            i = j;
        }

我当时就纳闷了，越界难道是解码表没弄好？因此找不到，就继续找下去。但实际情况不可能出现找不到的情况。所以我在解码表和编码表又调试半天，发现没问题只能重新思考了。好，话说回来，你本应该检索到 00 才能m.get(t)到，可错误的方法让你得到的只有一个0，当然会一直检索啦，然后报错，然后摆烂hh~（bushi）。

正数的情况：

比如 00010，你转成的byte是2，然后用错误方法解码得到的只有10，少了足足3个0，然后报错了。看到这里大家应该明白了问题根源了：前导0在编码解码会少去！

因此没报错的玩家只能说例子不够多，可能遇到是 0 ，10，1101之类没前导0的好情况。

解决办法：

这里我搜了很多博客看到的方法，将前导0后移！看到这里我就自己动手了，太妙了！

我的思路：针对最后一个字节的处理，设置个静态变量COUNT=0，记录前导0个数。如果全是0的情况如： 0000 那就写个循环输出就好；如果是正数，那就先>>=COUNT后，再在前面补COUNT个0

好了：两个核心矛盾部分的代码如下：

    //生成编码后的结果 字符串-》字节数组
    public static byte[] zip(byte[] bytes, Map<Byte, String> huffmanCodes) {
        StringBuilder sb2 = new StringBuilder();
        for (byte b : bytes) {
            sb2.append(huffmanCodes.get(b));
        }
        int len = (sb2.length() + 7) / 8;
        byte[] huffmanCodeBytes = new byte[len];
        int index = 0;
        String str;
        for (int i = 0; i < sb2.length(); i += 8) {
            if (i + 8 > sb2.length()) {
                str = sb2.substring(i);
                //这里要特别注意 0010、000 等特殊情况 后移前面的0或者记录共同有几个0  解码位        
                  运算并补0即可
                for (; COUNT < str.length(); COUNT++) {
                    if (str.charAt(COUNT) != '0') break;
                }
                if (COUNT > 0 && COUNT < str.length())
                    str = str.substring(COUNT) + str.substring(0, COUNT);
            } else {
                str = sb2.substring(i, i + 8);
            }
            huffmanCodeBytes[index++] = (byte) Integer.parseInt(str, 2);
        }
        return huffmanCodeBytes;
    }

    //将byte转成一个二进制的字符串
    public static String byteToString(boolean flag, byte b) {
        int tmp = b;
        String s = "";
        if (flag) {
            for (int i = 0; i < COUNT; i++)
                s += '0';
            if (tmp == 0)
                return s;
            else if (tmp > 0) {
                tmp >>= COUNT;
                s += Integer.toBinaryString(tmp);
                return s;
            } else {
                s += Integer.toBinaryString(tmp);
                return s.substring(s.length() - 8);
            }
        } else {
            if (tmp >= 0)
                tmp |= 256;
            s += Integer.toBinaryString(tmp);
            return s.substring(s.length() - 8);
        }
    }

debug路上的插曲：

当时我碰到越界问题时，结合自己代码于是就想着是不是要终止条件就行了，让j>len就break；

当时改的代码（正确的其实并不需要改hh）：

        //对照表解码
        List<Byte> li = new ArrayList<>();
        int len = sb.length();
        for (int i = 0; i < len; ) {
            int j = i + 1;
            while (true) {
                if(j>len)break;
                String t = sb.substring(i, j);
                if (m.get(t) != null) {
                    li.add(m.get(t));
                    break;
                }
                j++;
            }
            i = j;
        }

报错是没了，但转出来会少最后一个字节，那时我去解压照片，看着没啥问题，点进属性看了居然比原图少一个字节，我就强烈预感事情没那么简单hh。自学编程路上真是充满挫折啊，挺有意思的，写下这篇博客也是当作记录我此刻激动的心情把。文笔有限，不懂的地方或者其他问题欢迎大家评论区或私信交流哦！当然也欢迎大佬指点优化！

完整代码：

import java.io.*;
import java.util.*;

public class HuffmanTreeCode {
    public static void main(String[] args) throws Exception {
        //哈夫曼编码 前缀编码 0左1右
        String str = "i like like like java do you like a java";
        byte[] bytes = str.getBytes();
        //压缩
        byte[] t = huffmanZip(bytes);
        //解压
        byte[] d_t = decode(huffmanCodes, t);
        System.out.println(new String(d_t));//用 arrays.toString 不会转成字符  string有相应字符集

        //文件压缩与解压测试
       /* String src = "stu_module/draft/ff.jpg";
        String det = "stu_module/draft/ff1.zip";
        fileHuffmanCode(src, det);
        fileDecode(det, "stu_module/draft/ff2.jpg");*/
    }

    //统计次数
    public static List<CNode> getNodes(byte[] bytes) {
        List<CNode> nodes = new ArrayList<>();
        Map<Byte, Integer> m = new HashMap<>();
        for (byte b : bytes) {
            Integer count = m.get(b);
            if (count == null) m.put(b, 1);
            else m.put(b, count + 1);
        }
        for (Map.Entry<Byte, Integer> entry : m.entrySet()) {
            nodes.add(new CNode(entry.getKey(), entry.getValue()));
        }
        return nodes;
    }

    public static CNode createHuffmanTree(List<CNode> nodes) {
        while (nodes.size() > 1) {
            Collections.sort(nodes);
            CNode left = nodes.get(0);
            CNode right = nodes.get(1);
            CNode parent = new CNode(null, left.weight + right.weight);
            parent.left = left;
            parent.right = right;
            nodes.add(parent);
            nodes.remove(left);
            nodes.remove(right);
        }
        return nodes.get(0);
    }

    static Map<Byte, String> huffmanCodes = new HashMap<>();//存储编码表
    static StringBuilder sb = new StringBuilder();
    static int COUNT = 0;//记录移位的0

    //生成哈夫曼编码表，存入map
    public static void getCodes(CNode cNode, String code, StringBuilder sb) {
        StringBuilder sb2 = new StringBuilder(sb);
        sb2.append(code);
        if (cNode != null) {
            if (cNode.data == null) {
                //递归！！！
                getCodes(cNode.left, "0", sb2);
                getCodes(cNode.right, "1", sb2);
            } else {
                //叶子结点
                huffmanCodes.put(cNode.data, sb2.toString());
            }
        }
    }

    //生成编码后的结果 字符串-》字节数组
    public static byte[] zip(byte[] bytes, Map<Byte, String> huffmanCodes) {
        StringBuilder sb2 = new StringBuilder();
        for (byte b : bytes) {
            sb2.append(huffmanCodes.get(b));
        }
        int len = (sb2.length() + 7) / 8;
        byte[] huffmanCodeBytes = new byte[len];
        int index = 0;
        String str;
        for (int i = 0; i < sb2.length(); i += 8) {
            if (i + 8 > sb2.length()) {
                str = sb2.substring(i);
                //这里要特别注意 0010、000 等特殊情况 后移前面的0或者记录共同有几个0  解码位运算并补0即可
                for (; COUNT < str.length(); COUNT++) {
                    if (str.charAt(COUNT) != '0') break;
                }
                if (COUNT > 0 && COUNT < str.length())
                    str = str.substring(COUNT) + str.substring(0, COUNT);
            } else {
                str = sb2.substring(i, i + 8);
            }
            huffmanCodeBytes[index++] = (byte) Integer.parseInt(str, 2);
        }
        return huffmanCodeBytes;
    }

    //压缩得到字节数组,这里只是封装成一个方法，方便使用罢了
    public static byte[] huffmanZip(byte[] bytes) {

        List<CNode> nodes = getNodes(bytes);
        CNode root = createHuffmanTree(nodes);
        //得到编码表
        getCodes(root, "", sb);
        //压缩
        byte[] zip = zip(bytes, huffmanCodes);
        return zip;
    }

    //将byte转成一个二进制的字符串
    public static String byteToString(boolean flag, byte b) {
        int tmp = b;
        String s = "";
        if (flag) {
            for (int i = 0; i < COUNT; i++)
                s += '0';
            if (tmp == 0)
                return s;
            else if (tmp > 0) {
                tmp >>= COUNT;
                s += Integer.toBinaryString(tmp);
                return s;
            } else {
                s += Integer.toBinaryString(tmp);
                return s.substring(s.length() - 8);
            }
        } else {
            if (tmp >= 0)
                tmp |= 256;
            s += Integer.toBinaryString(tmp);
            return s.substring(s.length() - 8);
        }
    }

    //解码
    public static byte[] decode(Map<Byte, String> map, byte[] bytes) {
        StringBuilder sb = new StringBuilder();//存放转出来的二进制字符串
        boolean flag;
        for (int i = 0; i < bytes.length; i++) {
            flag = (i == bytes.length - 1);
            sb.append(byteToString(flag, bytes[i]));
        }
        //获得解码表
        Map<String, Byte> m = new HashMap<>();
        for (Map.Entry<Byte, String> entry : map.entrySet()) {
            m.put(entry.getValue(), entry.getKey());
        }
        //对照表解码
        List<Byte> li = new ArrayList<>();
        int len = sb.length();
        for (int i = 0; i < len; ) {
            int j = i + 1;
            while (true) {
                String t = sb.substring(i, j);
                if (m.get(t) != null) {
                    li.add(m.get(t));
                    break;
                }
                j++;
            }
            i = j;
        }
        byte[] b = new byte[li.size()];
        for (int i = 0; i < b.length; i++) {
            b[i] = li.get(i);
        }
        return b;
    }

    //文件编码
    public static void fileHuffmanCode(String src, String det) throws Exception {
        InputStream io = new FileInputStream(src);
        byte[] bytes = io.readAllBytes();
        byte[] zip = huffmanZip(bytes);
        //对象流
        ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(det));
        oos.writeObject(zip);
        oos.writeObject(huffmanCodes);
        io.close();
        oos.close();
    }

    //文件解压
    public static void fileDecode(String src, String det) throws Exception {
        ObjectInputStream ois = new ObjectInputStream(new FileInputStream(src));
        byte[] bytes = (byte[]) ois.readObject();
        Map<Byte, String> map = (Map<Byte, String>) ois.readObject();
        byte[] decode = decode(map, bytes);
        OutputStream os = new FileOutputStream(det);
        os.write(decode);
        ois.close();
        os.close();
    }
}

class CNode implements Comparable<CNode> {
    Byte data;//存放字符 用封装类可以设置null
    int weight;//权值
    CNode left;
    CNode right;

    public CNode(Byte data, int weight) {
        this.data = data;
        this.weight = weight;
    }

    public void preOrder() {
        System.out.println(this);
        if (this.left != null) this.left.preOrder();
        if (this.right != null) this.right.preOrder();
    }

    @Override
    public int compareTo(CNode o) {
        return this.weight - o.weight;
    }

    @Override
    public String toString() {
        return "CNode{" +
                "data=" + data +
                ", weight=" + weight +
                '}';
    }
}