解决读取文件乱码问题小结

最新推荐文章于 2023-04-30 15:10:40 发布

tacleech

最新推荐文章于 2023-04-30 15:10:40 发布

阅读量1.6k

点赞数 1

分类专栏：文件读取文章标签：读写文件乱码

本文链接：https://blog.csdn.net/tacleech/article/details/9450879

版权

文件读取专栏收录该内容

6 篇文章 0 订阅

订阅专栏

最经用到读取txt，并对里面的部分进行查找替换，读写的时候出现乱码，以下是查到的判断文件编码方法，不过不太好用

//对于UTF-8编码格式的文本文件，其前3个字节的值就是-17、-69、-65，所以，判定是否是UTF-8编码格式的代码片段如下：

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

public class Test {

	public static void main(String[] args) {
		File f=new File("待判定的文本文件名");   
		 try{   
		  InputStream ios=new FileInputStream(f);   
		   byte[] b=new byte[3];   
		   ios.read(b);   
		   ios.close();   
		   if(b[0]==-17&&b[1]==-69&&b[2]==-65)   
		      System.out.println(f.getName()+"编码为UTF-8");   
		   else System.out.println(f.getName()+"可能是GBK");   
		 }catch(Exception e){   
		   e.printStackTrace();
		 }
	}
}

上边是网上查到的判断文件编码格式的不太好使用着，继续查查到如下内容：

若想实现更复杂的文件编码检测，可以使用一个开源项目cpdetector，代码如下：

detector是探测器，它把探测任务交给具体的探测实现类的实例完成。 cpDetector内置了一些常用的探测实现类，这些探测实现类的实例可以通过add方法

加进来，如ParsingDetector、 JChardetFacade、ASCIIDetector、UnicodeDetector。 detector按照“谁最先返回非空的探测结果，就以该结果为准”的原则

返回探测到的字符集编码。

cpdetector.io.CodepageDetectorProxy detector =   
cpdetector.io.CodepageDetectorProxy.getInstance();

ParsingDetector可用于检查HTML、XML等文件或字符流的编码,构造方法中的参数用于指示是否显示探测过程的详细信息，为false不显示。

detector.add(new cpdetector.io.ParsingDetector(false));

JChardetFacade封装了由Mozilla组织提供的JChardet，它可以完成大多数文件的编码测定。所以，一般有了这个探测器就可满足大多数项目的要求，如果

加进来，如ParsingDetector、 JChardetFacade、ASCIIDetector、UnicodeDetector。detector按照“谁最先返回非空的探测结果，就以该结果为准”的原则

返回探测到的字符集编码。

下面是自己写的测试以下：

import info.monitorenter.cpdetector.io.ASCIIDetector;
import info.monitorenter.cpdetector.io.CodepageDetectorProxy;
import info.monitorenter.cpdetector.io.JChardetFacade;
import info.monitorenter.cpdetector.io.UnicodeDetector;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.nio.charset.Charset;

public class CharsetUtil {
	/**
	 * 检查文件的编码格式
	 * @param path 待查文件路径
	 * @return String文件的编码名
	 */
	public static String getCharset(String path){
		CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();
		detector.add(JChardetFacade.getInstance());
		detector.add(ASCIIDetector.getInstance());
		detector.add(UnicodeDetector.getInstance());
		File file = new File(path);
		Charset charset = null;
		try {
			charset = detector.detectCodepage(file.toURL());
			if(charset!=null){
				return charset.name();
			}else{
				return null;
			}
		} catch (MalformedURLException e) {
			e.printStackTrace();
			return null;
		} catch (IOException e) {
			e.printStackTrace();
			return null;
		}
	}

注：还需要引入两个Jar包 cpdetector_1.0.8.jar .jar 和 jchardet-1.0.jar不然会报错

tacleech

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
解决读取文件乱码问题小结

//对于UTF-8编码格式的文本文件，其前3个字节的值就是-17、-69、-65，所以，判定是否是UTF-8编码格式的代码片段如下：import java.io.File;import java.io.FileInputStream;import java.io.InputStream;public class Test { public static void main(Stri
复制链接

扫一扫