手搓基于控制台的java压缩程序-编码预处理2

本文链接：https://blog.csdn.net/m0_74083116/article/details/131235603

文章介绍了如何使用Java实现文件压缩，主要通过建立哈夫曼树并计算字节频率来优化编码表，进而压缩文件。代码包括Main函数的初始化、压缩类Compress的各个方法如io操作、频率统计、节点构建和编码过程。

摘要由CSDN通过智能技术生成

想要实现文件的压缩，我们需要建立一套文件专用的编码表，充分利用表可以表达数据量。

1.Main函数的编写（未完善）

import java.util.Scanner;

public class Main {
    public static void main(String[] args){
        Scanner scanner = new Scanner(System.in);
        String filefrom =null;
        String fileto = null;
        filefrom  = scanner.nextLine();
        fileto = scanner.nextLine();
        Compress compress = new Compress(filefrom,fileto);
        compress.compress();
    }
}

思路如下：

用户输入想要压缩的文件地址以及压缩完后的文件名后，调用Compress类中的compress方法实现文件压缩。

2.压缩类Compress的编写

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.PriorityQueue;

public class Compress {
    String filefrom;
    String fileto;
    FileInputStream fileInputStream;
    BufferedInputStream bufferedInputStream;
    FileOutputStream fileOutputStream;
    BufferedOutputStream bufferedOutputStream;
    int[] fre = new int[256];
    int all = 0;
    CodeNode root;
    HashMap<Byte,String> easytable = new HashMap<>();

    public Compress(String filefrom,String fileto) {
        this.filefrom = filefrom;
        this.fileto = fileto;
    }

    public void compress(){
        io();
        frequency();
        doNode();
        doTree();
        encode();
    }

    public void io(){
        try {
            fileInputStream = new FileInputStream(filefrom);
        } catch (FileNotFoundException e) {
            System.out.println(e.getMessage());
        }
        bufferedInputStream = new BufferedInputStream(fileInputStream);
        try{
            fileOutputStream = new FileOutputStream(fileto);
            bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
        } catch(Exception e){
            System.out.println(e.getMessage());
        }
    }
    public void frequency() {
        int length;
        byte[] bytes = new byte[1024];
        try {
            while ((length = bufferedInputStream.read(bytes)) > 0) {
                for (int i = 0; i<length;i++){
                    fre[bytes[i]+128]++;
                }
                all+=length;
            }
        } catch (Exception e){
            System.out.println("缓冲读取流错误或文件单字符超过2g");
        }
        System.out.println("origin: " + all + " bytes");
    }

    public void doNode(){
        PriorityQueue<CodeNode> codeNodes = new PriorityQueue<>(Comparator.comparing(CodeNode::getFre));
        for(int i=0;i<256;i++){
            if(fre[i]!=0){
                codeNodes.add(new CodeNode((byte)(i-128),fre[i],true));
            }
        }

        while(codeNodes.size()!=1){
            CodeNode left = codeNodes.poll();
            CodeNode right = codeNodes.poll();
            CodeNode codeNode = new CodeNode((byte)0,left.fre + right.fre,false);
            codeNode.left = left;
            codeNode.right = right;
            codeNodes.add(codeNode);
        }
        root = codeNodes.poll();
    }

    public void doTree(){
        search(root,"");

    }

    public void search(CodeNode codeNode,String code){
        if(codeNode ==null){
            return ;
        }
        if(codeNode.leaf){
            easytable.put(codeNode.value,code);
        }

        search(codeNode.left,code+"0");
        search(codeNode.right,code+"1");
    }

    public void encode(){

    }
}

（1）首先我们在构造方法中为文件原地址以及输出名进行赋值

    public Compress(String filefrom,String fileto) {
        this.filefrom = filefrom;
        this.fileto = fileto;
    }

（2）接下来就是压缩方法的编写了

    public void compress(){
        io();
        frequency();
        doNode();
        doTree();
        encode();
    }

在这里我们建立了io方法，frequency方法，doNode方法，doTree方法以及encode方法

分别对应文件输入输出缓冲流的舒适化，统计源文件每个字节的编码频率，构造哈夫曼树的节点，构造基于哈夫曼树的编码表以及向硬盘写入压缩好的文件。

接下来笔者带大家依次分析源码

（3）io方法

    public void io(){
        try {
            fileInputStream = new FileInputStream(filefrom);
        } catch (FileNotFoundException e) {
            System.out.println(e.getMessage());
        }
        bufferedInputStream = new BufferedInputStream(fileInputStream);
        try{
            fileOutputStream = new FileOutputStream(fileto);
            bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
        } catch(Exception e){
            System.out.println(e.getMessage());
        }
    }

该类非常简单。建立流，方便后续的方法读写文件。注意：变量需要在方法外创建。

（4）frequency方法

我们需要统计源文件每种字节出现的频率，以此来构建最优的编码表。也就是对于这个文件来说最合适的编码，去除掉了一些无意义的内容，具体可以参考我上篇的理论构建，我再次不赘述了。

    public void frequency() {
        int length;
        byte[] bytes = new byte[1024];
        try {
            while ((length = bufferedInputStream.read(bytes)) > 0) {
                for (int i = 0; i<length;i++){
                    fre[bytes[i]+128]++;
                }
                all+=length;
            }
        } catch (Exception e){
            System.out.println("缓冲读取流错误或文件单字符超过2g");
        }
        System.out.println("origin: " + all + " bytes");
    }

我们使用缓冲输入流每次往bytes数组读入1024kb也就是1mb的数据量，同时统计该数组元素出现的频率（bytes[i]+128为什么要加128？回答：八个比特位读出来的范围是-128~127我们创建数组的时候下标是大于等于0的。当然可能有人会问，系统会不会用int格式读取在前面补24个0这样128不就是多余的了吗？回答：我试过了，这样做的结果是报错，看来系统并不会更改方括号内的数据格式。补充：当然在满足上述情况的条件下你加200也是可行的，但在解压是也要注意扣除这200）另外笔者在这里添加了一个统计大小的变量。你也可以不加。

（5）doNode方法

该方法目的是构建出哈夫曼树的节点。样子有点类似于链表。

    public void doNode(){
        PriorityQueue<CodeNode> codeNodes = new PriorityQueue<>(Comparator.comparing(CodeNode::getFre));
        for(int i=0;i<256;i++){
            if(fre[i]!=0){
                codeNodes.add(new CodeNode((byte)(i-128),fre[i],true));
            }
        }

        while(codeNodes.size()!=1){
            CodeNode left = codeNodes.poll();
            CodeNode right = codeNodes.poll();
            CodeNode codeNode = new CodeNode((byte)0,left.fre + right.fre,false);
            codeNode.left = left;
            codeNode.right = right;
            codeNodes.add(codeNode);
        }
        root = codeNodes.poll();
    }

这里我们需要补充一个类

public class CodeNode {
        byte value;
        int fre;
        boolean leaf;
        CodeNode left;
        CodeNode right;

    public CodeNode(byte value, int fre, boolean leaf) {
        this.value = value;
        this.fre = fre;
        this.leaf = leaf;
    }

    public int getFre(){
        return fre;
    }

    public boolean getLeaf(){
        return leaf;
    }
}

未完待续。。。。。