这个错误是由于编码问题,需要先检测文件编码,修改后代码如下:
File file = new File(sfile);
// //利用Tika的AutoDetectReader类检测文件的编码格式
dr = new AutoDetectReader(new FileInputStream(file));
String charset = dr.getCharset().name();
System.out.println("********charset********:" +charset);
input = new FileInputStream(file);
// ZipInputStream zip = new ZipInputStream(input);
BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(sfile));
// ZipInputStream zip = new ZipInputStream(bufferedInputStream, Charset.forName("utf-8"));
ZipInputStream zip = new ZipInputStream(bufferedInputStream, dr.getCharset());