这边使用的是UniversalDetector类,这个类需要通过maven依赖导入,对应的依赖:
<!-- 文件编码识别工具 -->
<dependency>
<groupId>com.github.albfernandez</groupId>
<artifactId>juniversalchardet</artifactId>
<version>2.3.0</version>
</dependency>
导入对应的依赖后就可以使用UniversalDetector类来获取文件的编码了:
package org.example;
import org.mozilla.universalchardet.UniversalDetector;
import java.io.*;
import java.nio.charset.Charset;
public class TestCharSet {
public static Charset detectFileEncoding(String filePath) throws IOException {
FileInputStream fis = new FileInputStream(new File(filePath));
BufferedInputStream bis = new BufferedInputStream(fis);
Charset charset = Charset.defaultCharset();
byte[] buffer = new byte[4096];
UniversalDetector detector = new UniversalDetector(null);
int bytesRead;
while ((bytesRead = bis.read(buffer)) != -1) {
if (detector.isDone()) {
break;
}
detector.handleData(buffer, 0, bytesRead);
}
detector.dataEnd();
String encoding = detector.getDetectedCharset();
if (encoding != null) {
charset = Charset.forName(encoding);
}
detector.reset();
bis.close();
fis.close();
return charset;
}
public static void main(String[] args) {
String filePath = "E:/123.txt";
try {
Charset fileEncoding = detectFileEncoding(filePath);
System.out.println("File Encoding: " + fileEncoding.name());
} catch (IOException e) {
System.out.println("Error occurred while detecting file encoding: " + e.getMessage());
}
}
}
测试的结果为: