利用Java实现压缩与解压缩（zip、gzip）支持中文路径

最新推荐文章于 2023-06-15 10:37:17 发布

ttldxl

最新推荐文章于 2023-06-15 10:37:17 发布

阅读量105

点赞数

分类专栏： Java 文章标签： Java Apache 算法 C C++

本文链接：https://blog.csdn.net/ttldxl/article/details/83946402

版权

Java 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

zip扮演着归档和压缩两个角色；gzip并不将文件归档，仅只是对单个文件进行压缩，所以，在UNIX平台上，命令tar通常用来创建一个档案文件，然后命令gzip来将档案文件压缩。

Java I/O类库还收录了一些能读写压缩格式流的类。要想提供压缩功能，只要把它们包在已有的I/O类的外面就行了。这些类不是Reader和Writer，而是InputStream和OutStreamput的子类。这是因为压缩算法是针对byte而不是字符的。

相关类与接口：
Checksum 接口：被类Adler32和CRC32实现的接口
Adler32 ：使用Alder32算法来计算Checksum数目
CRC32 ：使用CRC32算法来计算Checksum数目

CheckedInputStream ：InputStream派生类，可得到输入流的校验和Checksum，用于校验数据的完整性
CheckedOutputStream ：OutputStream派生类，可得到输出流的校验和Checksum，用于校验数据的完整性

DeflaterOutputStream ：压缩类的基类。
ZipOutputStream ：DeflaterOutputStream的一个子类，把数据压缩成Zip文件格式。
GZIPOutputStream ：DeflaterOutputStream的一个子类，把数据压缩成GZip文件格式

InflaterInputStream ：解压缩类的基类
ZipInputStream ：InflaterInputStream的一个子类，能解压缩Zip格式的数据
GZIPInputStream ：InflaterInputStream的一个子类，能解压缩Zip格式的数据

ZipEntry 类：表示 ZIP 文件条目
ZipFile 类：此类用于从 ZIP 文件读取条目

用GZIP进行对单个文件压缩

GZIP的接口比较简单，因此如果你只需对一个流进行压缩的话，可以使用它。当然它可以压缩字符流，与可以压缩字节流，下面是一个对GBK编码格式的文本文件进行压缩的。
压缩类的用法非常简单；只要用GZIPOutputStream 或ZipOutputStream把输出流包起来，再用GZIPInputStream 或ZipInputStream把输入流包起来就行了。剩下的都是些普通的I/O操作。

Java代码

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class GZIPcompress {
public static void main(String[] args) throws IOException {
//做准备压缩一个字符文件，注，这里的字符文件要是GBK编码方式的
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(
"e:/tmp/source.txt"), "GBK"));
//使用GZIPOutputStream包装OutputStream流，使其具体压缩特性，最后会生成test.txt.gz压缩包
//并且里面有一个名为test.txt的文件
BufferedOutputStream out = new BufferedOutputStream(new GZIPOutputStream(
new FileOutputStream("test.txt.gz")));
System.out.println("开始写压缩文件...");
int c;
while ((c = in.read()) != -1) {
/*
* 注，这里是压缩一个字符文件，前面是以字符流来读的，不能直接存入c，因为c已是Unicode
* 码，这样会丢掉信息的（当然本身编码格式就不对），所以这里要以GBK来解后再存入。
*/
out.write(String.valueOf((char) c).getBytes("GBK"));
}
in.close();
out.close();
System.out.println("开始读压缩文件...");
//使用GZIPInputStream包装InputStream流，使其具有解压特性
BufferedReader in2 = new BufferedReader(new InputStreamReader(
new GZIPInputStream(new FileInputStream("test.txt.gz")), "GBK"));
String s;
//读取压缩文件里的内容
while ((s = in2.readLine()) != null) {
System.out.println(s);
}
in2.close();
}
}

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public class GZIPcompress {
	public static void main(String[] args) throws IOException {
		//做准备压缩一个字符文件，注，这里的字符文件要是GBK编码方式的
		BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(
				"e:/tmp/source.txt"), "GBK"));
		//使用GZIPOutputStream包装OutputStream流，使其具体压缩特性，最后会生成test.txt.gz压缩包
		//并且里面有一个名为test.txt的文件
		BufferedOutputStream out = new BufferedOutputStream(new GZIPOutputStream(
				new FileOutputStream("test.txt.gz")));
		System.out.println("开始写压缩文件...");
		int c;
		while ((c = in.read()) != -1) {

			/* 
			 * 注，这里是压缩一个字符文件，前面是以字符流来读的，不能直接存入c，因为c已是Unicode
			 * 码，这样会丢掉信息的（当然本身编码格式就不对），所以这里要以GBK来解后再存入。
			 */
			out.write(String.valueOf((char) c).getBytes("GBK"));
		}
		in.close();
		out.close();
		System.out.println("开始读压缩文件...");
		//使用GZIPInputStream包装InputStream流，使其具有解压特性
		BufferedReader in2 = new BufferedReader(new InputStreamReader(
				new GZIPInputStream(new FileInputStream("test.txt.gz")), "GBK"));
		String s;
		//读取压缩文件里的内容
		while ((s = in2.readLine()) != null) {
			System.out.println(s);
		}
		in2.close();
	}
}

使用Zip进行多个文件压缩

Java对Zip格式类库支持得比较全面，得用它可以把多个文件压缩成一个压缩包。这个类库使用的是标准Zip格式，所以能与很多的压缩工具兼容。

ZipOutputStream类有设置压缩方法以及在压缩方式下使用的压缩级别，zipOutputStream.setMethod(int method)设置用于条目的默认压缩方法。只要没有为单个 ZIP 文件条目指定压缩方法，就使用ZipOutputStream所设置的压缩方法来存储，默认值为 ZipOutputStream.DEFLATED（表示进行压缩存储），还可以设置成STORED（表示仅打包归档存储）。ZipOutputStream在设置了压缩方法为DEFLATED后，我们还可以进一步使用setLevel(int level)方法来设置压缩级别，压缩级别值为0-9共10个级别(值越大，表示压缩越利害)，默认为Deflater.DEFAULT_COMPRESSION=-1。当然我们也可以通过条目ZipEntry的setMethod方法为单个条件设置压缩方法。

类ZipEntry描述了存储在ZIP文件中的压缩文件。类中包含有多种方法可以用来设置和获得ZIP条目的信息。类ZipEntry是被ZipFile[zipFile.getInputStream(ZipEntry entry)]和ZipInputStream使用来读取ZIP文件，ZipOutputStream来写入ZIP文件的。有以下这些有用的方法：getName()返回条目名称、isDirectory()如果为目录条目，则返回 true（目录条目定义为其名称以 '/' 结尾的条目）、setMethod(int method) 设置条目的压缩方法，可以为 ZipOutputStream.STORED 或 ZipOutputStream .DEFLATED。

下面实例我们使用了apache的zip工具包（所在包为ant.jar ），因为java类型自带的不支持中文路径，不过两者使用的方式是一样的，只是apache压缩工具多了设置编码方式的接口，其他基本上是一样的。另外，如果使用org.apache.tools.zip.ZipOutputStream来压缩的话，我们只能使用org.apache.tools.zip.ZipEntry来解压，而不能使用java.util.zip.ZipInputStream来解压读取了，当然apache并未提供ZipInputStream类。

Java代码

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.CRC32;
import java.util.zip.CheckedInputStream;
import java.util.zip.CheckedOutputStream;
import java.util.zip.Deflater;
import java.util.zip.ZipException;
import java.util.zip.ZipInputStream;
import org.apache.tools.zip.ZipEntry;
import org.apache.tools.zip.ZipFile;
import org.apache.tools.zip.ZipOutputStream;
/**
*
* 提供对单个文件与目录的压缩，并支持是否需要创建压缩源目录、中文路径
*
* @author jzj
*/
public class ZipCompress {
private static boolean isCreateSrcDir = true;//是否创建源目录
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
String src = "m:/新建文本文档.txt";//指定压缩源，可以是目录或文件
String decompressDir = "e:/tmp/decompress";//解压路径
String archive = "e:/tmp/test.zip";//压缩包路径
String comment = "Java Zip 测试.";//压缩包注释
//----压缩文件或目录
writeByApacheZipOutputStream(src, archive, comment);
/*
* 读压缩文件，注释掉，因为使用的是apache的压缩类，所以使用java类库中
* 解压类时出错，这里不能运行
*/
//readByZipInputStream();
//----使用apace ZipFile读取压缩文件
readByApacheZipFile(archive, decompressDir);
}
public static void writeByApacheZipOutputStream(String src, String archive,
String comment) throws FileNotFoundException, IOException {
//----压缩文件：
FileOutputStream f = new FileOutputStream(archive);
//使用指定校验和创建输出流
CheckedOutputStream csum = new CheckedOutputStream(f, new CRC32());
ZipOutputStream zos = new ZipOutputStream(csum);
//支持中文
zos.setEncoding("GBK");
BufferedOutputStream out = new BufferedOutputStream(zos);
//设置压缩包注释
zos.setComment(comment);
//启用压缩
zos.setMethod(ZipOutputStream.DEFLATED);
//压缩级别为最强压缩，但时间要花得多一点
zos.setLevel(Deflater.BEST_COMPRESSION);
File srcFile = new File(src);
if (!srcFile.exists() || (srcFile.isDirectory() && srcFile.list().length == 0)) {
throw new FileNotFoundException(
"File must exist and ZIP file must have at least one entry.");
}
//获取压缩源所在父目录
src = src.replaceAll("\\\\", "/");
String prefixDir = null;
if (srcFile.isFile()) {
prefixDir = src.substring(0, src.lastIndexOf("/") + 1);
} else {
prefixDir = (src.replaceAll("/$", "") + "/");
}
//如果不是根目录
if (prefixDir.indexOf("/") != (prefixDir.length() - 1) && isCreateSrcDir) {
prefixDir = prefixDir.replaceAll("[^/]+/$", "");
}
//开始压缩
writeRecursive(zos, out, srcFile, prefixDir);
out.close();
// 注：校验和要在流关闭后才准备，一定要放在流被关闭后使用
System.out.println("Checksum: " + csum.getChecksum().getValue());
BufferedInputStream bi;
}
/**
* 使用 org.apache.tools.zip.ZipFile 解压文件，它与 java 类库中的
* java.util.zip.ZipFile 使用方式是一新的，只不过多了设置编码方式的
* 接口。
*
* 注，apache 没有提供 ZipInputStream 类，所以只能使用它提供的ZipFile
* 来读取压缩文件。
* @param archive 压缩包路径
* @param decompressDir 解压路径
* @throws IOException
* @throws FileNotFoundException
* @throws ZipException
*/
public static void readByApacheZipFile(String archive, String decompressDir)
throws IOException, FileNotFoundException, ZipException {
BufferedInputStream bi;
ZipFile zf = new ZipFile(archive, "GBK");//支持中文
Enumeration e = zf.getEntries();
while (e.hasMoreElements()) {
ZipEntry ze2 = (ZipEntry) e.nextElement();
String entryName = ze2.getName();
String path = decompressDir + "/" + entryName;
if (ze2.isDirectory()) {
System.out.println("正在创建解压目录 - " + entryName);
File decompressDirFile = new File(path);
if (!decompressDirFile.exists()) {
decompressDirFile.mkdirs();
}
} else {
System.out.println("正在创建解压文件 - " + entryName);
String fileDir = path.substring(0, path.lastIndexOf("/"));
File fileDirFile = new File(fileDir);
if (!fileDirFile.exists()) {
fileDirFile.mkdirs();
}
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(
decompressDir + "/" + entryName));
bi = new BufferedInputStream(zf.getInputStream(ze2));
byte[] readContent = new byte[1024];
int readCount = bi.read(readContent);
while (readCount != -1) {
bos.write(readContent, 0, readCount);
readCount = bi.read(readContent);
}
bos.close();
}
}
zf.close();
}
/**
* 使用 java api 中的 ZipInputStream 类解压文件，但如果压缩时采用了
* org.apache.tools.zip.ZipOutputStream时，而不是 java 类库中的
* java.util.zip.ZipOutputStream时，该方法不能使用，原因就是编码方
* 式不一致导致，运行时会抛如下异常：
* java.lang.IllegalArgumentException
* at java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:290)
*
* 当然，如果压缩包使用的是java类库的java.util.zip.ZipOutputStream
* 压缩而成是不会有问题的，但它不支持中文
*
* @param archive 压缩包路径
* @param decompressDir 解压路径
* @throws FileNotFoundException
* @throws IOException
*/
public static void readByZipInputStream(String archive, String decompressDir)
throws FileNotFoundException, IOException {
BufferedInputStream bi;
//----解压文件(ZIP文件的解压缩实质上就是从输入流中读取数据):
System.out.println("开始读压缩文件");
FileInputStream fi = new FileInputStream(archive);
CheckedInputStream csumi = new CheckedInputStream(fi, new CRC32());
ZipInputStream in2 = new ZipInputStream(csumi);
bi = new BufferedInputStream(in2);
java.util.zip.ZipEntry ze;//压缩文件条目
//遍历压缩包中的文件条目
while ((ze = in2.getNextEntry()) != null) {
String entryName = ze.getName();
if (ze.isDirectory()) {
System.out.println("正在创建解压目录 - " + entryName);
File decompressDirFile = new File(decompressDir + "/" + entryName);
if (!decompressDirFile.exists()) {
decompressDirFile.mkdirs();
}
} else {
System.out.println("正在创建解压文件 - " + entryName);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(
decompressDir + "/" + entryName));
byte[] buffer = new byte[1024];
int readCount = bi.read(buffer);
while (readCount != -1) {
bos.write(buffer, 0, readCount);
readCount = bi.read(buffer);
}
bos.close();
}
}
bi.close();
System.out.println("Checksum: " + csumi.getChecksum().getValue());
}
/**
* 递归压缩
*
* 使用 org.apache.tools.zip.ZipOutputStream 类进行压缩，它的好处就是支持中文路径，
* 而Java类库中的 java.util.zip.ZipOutputStream 压缩中文文件名时压缩包会出现乱码。
* 使用 apache 中的这个类与 java 类库中的用法是一新的，只是能设置编码方式了。
*
* @param zos
* @param bo
* @param srcFile
* @param prefixDir
* @throws IOException
* @throws FileNotFoundException
*/
private static void writeRecursive(ZipOutputStream zos, BufferedOutputStream bo,
File srcFile, String prefixDir) throws IOException, FileNotFoundException {
ZipEntry zipEntry;
String filePath = srcFile.getAbsolutePath().replaceAll("\\\\", "/").replaceAll(
"//", "/");
if (srcFile.isDirectory()) {
filePath = filePath.replaceAll("/$", "") + "/";
}
String entryName = filePath.replace(prefixDir, "").replaceAll("/$", "");
if (srcFile.isDirectory()) {
if (!"".equals(entryName)) {
System.out.println("正在创建目录 - " + srcFile.getAbsolutePath()
+ " entryName=" + entryName);
//如果是目录，则需要在写目录后面加上 /
zipEntry = new ZipEntry(entryName + "/");
zos.putNextEntry(zipEntry);
}
File srcFiles[] = srcFile.listFiles();
for (int i = 0; i < srcFiles.length; i++) {
writeRecursive(zos, bo, srcFiles[i], prefixDir);
}
} else {
System.out.println("正在写文件 - " + srcFile.getAbsolutePath() + " entryName="
+ entryName);
BufferedInputStream bi = new