java实现获取文本文件的编码格式

最新推荐文章于 2024-06-06 21:05:21 发布

carroll0911

最新推荐文章于 2024-06-06 21:05:21 发布

阅读量5.1k

点赞数

分类专栏： java 文章标签： java mozilla string exception file byte

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/Core_Star/article/details/5580536

版权

java 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

jchardet是mozilla自动字符集探测算法代码的java移植,其源代码可以从sourceforge下载

import org.mozilla.intl.chardet.HtmlCharsetDetector;

import org.mozilla.intl.chardet.nsDetector;

import org.mozilla.intl.chardet.nsICharsetDetectionObserver;

//方法获取文件的编码格式

public String getCharSetEncoding(File file) throws Exception

{

boolean found=false;

nsICharsetDetectionObserver ndo=new nsICharsetDetectionObserver() {

public void Notify(String arg0) {

HtmlCharsetDetector.found=true;

}

};

nsDetector det=new nsDetector();

/**
* 初始化nsDetector()
*lang为一个整数，用以提示语言线索，可以提供的语言线索有以下几个：
*

   1. Japanese
   2. Chinese
   3. Simplified Chinese
   4. Traditional Chinese
   5. Korean
   6. Dont know (默认)
*/

// nsDetector det=new nsDetector(lang);

det.Init(ndo);

BufferedInputStream bis=new BufferedInputStream(new FileInputStream(file));

byte[] buf=new byte[1024];

boolean done=false;

boolean isAscii=true;

int length=0;

while((length=bis.read(buf))!=-1)

{

if(isAscii)

{

isAscii=det.isAscii(buf, length);

}

if(!isAscii&&!done)

{

done=det.DoIt(buf, length, false);

}

}

det.DataEnd();

if(isAscii)

{

found=true;

return "ASCII";

}

if(!found)

{

String pro[]=det.getProbableCharsets();//获取可能的编码格式

if(pro.length>0)

{

return pro[0];//取第一个

}

}

return null;

}

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。