解压乱码，判断 zip 压缩包编码是 GBK，还是 UTF-8

最新推荐文章于 2025-03-07 16:29:04 发布

梦-回

最新推荐文章于 2025-03-07 16:29:04 发布

阅读量1.6w

点赞数 6

文章标签：乱码

本文链接：https://blog.csdn.net/dianyitongxiao_/article/details/108076211

版权

本文介绍了解决Java程序解压ZIP文件时出现乱码的问题，通过自定义解压方法并使用异常判断编码方式，确保解压过程正确处理编码。同时，针对特定情况，如UTF-8和GBK编码的文件，提供了详细的测试方法和代码实现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述

个人文档地址 http://blog.sqber.com/articles/zip-file-gbk-utf-8.html

问题

java 程序解压 zip 类型的压缩包，使用 WinRAR 压缩的解压出来就会乱码。
我使用的是 zip4j 解压的，根本原因还是解压时候编码问题。

如何鉴别文件编码

网络上提供的几种都以失败告终：

1、判断文件首字节
2、使用 cpdetector

鉴别编码失败，怎么办？

有个有趣的现象：如果自己写解压程序的话，编码不对会报异常，而不是乱码了。
因此我们可以使用异常来判断是否可行。

自己写解压

public static void doUnArchiver(File srcfile, String destpath) throws IOException {
    byte[] buf = new byte[1024];
    FileInputStream fis = new FileInputStream(srcfile);
    BufferedInputStream bis = new BufferedInputStream(fis);
    ZipInputStream zis = new ZipInputStream(bis);
    ZipEntry zn = null;
    while ((zn = zis.getNextEntry()) != null) {
        File f = new File(destpath + "/" + zn.getName());
        if (zn.isDirectory()) {
            f.mkdirs();
        } else {
            /*
             * 父目录不存在则创建
             */
            File parent = f.getParentFile();
            if (!parent.exists()) {
                parent.mkdirs();
            }
            FileOutputStream fos = new FileOutputStream(f);
            BufferedOutputStream bos = new BufferedOutputStream(fos);
            int len;
            while ((len = zis.read(buf)) != -1) {
                bos.write(buf, 0, len);
            }
            bos.flush();
            bos.close();
        }
        zis.closeEntry();
    }
    zis.close();
}

对其改造，进行判断

private boolean testEncoding(String filepath, Charset charset) throws FileNotFoundException {
    FileInputStream fis = new FileInputStream(new File(filepath));
    BufferedInputStream bis = new BufferedInputStream(fis);
    ZipInputStream zis = new ZipInputStream(bis, charset);
    ZipEntry zn = null;
    try {
        while ((zn = zis.getNextEntry()) != null) {
            // do nothing
        }
    } catch (Exception e) {
        return false;
    }finally {
        try {
            zis.close();
            bis.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return true;
}

测试结果如下：

System.out.println(testEncoding("d:/UTF8的文件.zip", Charset.forName("UTF-8"))); // true
System.out.println(testEncoding("d:/UTF8的文件.zip", Charset.forName("GBK"))); //true

System.out.println(testEncoding("d:/GBK的文件.zip", Charset.forName("UTF-8"))); //false
System.out.println(testEncoding("d:/GBK的文件.zip", Charset.forName("GBK"))); //true

注意：
对于 UTF-8 的文件，使用 GBK 返回的居然是 true
所以我们在使用 testEncoding 方法的时候，只能是测试 “UTF-8” 返回的是不是 false
如果是 false，则使用 GBK 编码

if (testEncoding(file, Charset.forName("UTF-8")) == false){
    zipFile.setCharset(Charset.forName("GBK"));
}

整体代码

pom.xml 引入 zip4j

<dependency>
	<groupId>net.lingala.zip4j</groupId>
	<artifactId>zip4j</artifactId>
	<version>2.6.1</version>
</dependency>

解压代码

// ZipFile 默认是使用 UTF-8 编码处理文件
ZipFile zipFile = new ZipFile(file);
if (testEncoding(file, Charset.forName("UTF-8")) == false){
    zipFile.setCharset(Charset.forName("GBK"));
}
zipFile.extractAll(baseDir);

private boolean testEncoding(String filepath, Charset charset) throws FileNotFoundException {
    FileInputStream fis = new FileInputStream(new File(filepath));
    BufferedInputStream bis = new BufferedInputStream(fis);
    ZipInputStream zis = new ZipInputStream(bis, charset);
    ZipEntry zn = null;
    try {
        while ((zn = zis.getNextEntry()) != null) {
            // do nothing
        }
    } catch (Exception e) {
        return false;
    }finally {
        try {
            zis.close();
            bis.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return true;
}

更新

在使用上面的解压程序时，又出现了新的问题。

有一压缩包 shuju.zip ，里面的内容为：

├── shuju
------└── 新建文件夹
-------------└── 李贽与晚明文学思想.pdf
------└── China-Africa Relations， Political Conditions，and Ngũgĩ Wa Thiong’o’s Wizard of the Crow Peter Leman.PDF
------└── 一手数据.xlsx

shuju.zip 文件下载地址：链接：https://pan.baidu.com/s/12ugoP2Vx8Ui7chKVGaq-1w 提取码：w6qq

解压之后，中文文件没有乱码，但其中一个英文文件乱码了。如下图：
在这里插入图片描述

解决办法：

改用 apache 下的一款解压缩工具： Apache Commons Compress

https://commons.apache.org/proper/commons-compress/examples.html
https://github.com/apache/commons-compress

具体使用如下：

pom.xml 文件引入

<!-- 解压缩 https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.18</version>
</dependency>

解压代码：

public static void decompressor(String zipFile, String targetDir) throws IOException, ArchiveException {

        File archiveFile = new File(zipFile);
        // 文件不存在，跳过
        if (!archiveFile.exists())
            return;

        ArchiveInputStream input = new ArchiveStreamFactory().createArchiveInputStream(new BufferedInputStream(new FileInputStream(zipFile)));
        ArchiveEntry entry = null;
        while ((entry = input.getNextEntry()) != null) {
            if (!input.canReadEntryData(entry)) {
                // log something?
                continue;
            }
            String name = Paths.get(targetDir, entry.getName()).toString();
            File f = new File(name);
            if (entry.isDirectory()) {
                if (!f.isDirectory() && !f.mkdirs()) {
                    throw new IOException("failed to create directory " + f);
                }
            } else {
                File parent = f.getParentFile();
                if (!parent.isDirectory() && !parent.mkdirs()) {
                    throw new IOException("failed to create directory " + parent);
                }
                try (OutputStream o = Files.newOutputStream(f.toPath())) {
                    IOUtils.copy(input, o);
                }
            }
        }
        input.close();

    }

在这里插入图片描述