java 文本文件编码,Java文本文件编码

最新推荐文章于 2022-11-27 08:06:09 发布

梅梅子MMZ

最新推荐文章于 2022-11-27 08:06:09 发布

阅读量51

点赞数

文章标签： java 文本文件编码

I have a text file and it can be ANSI (with ISO-8859-2 charset), UTF-8, UCS-2 Big or Little Endian.

Is there any way to detect the encoding of the file to read it properly?

Or is it possible to read a file without giving the encoding? (and it reads the file as it is)

(There are several program that can detect and convert encoding/format of text files.)

解决方案

UTF-8 and UCS-2/UTF-16 can be distinguished reasonably easily via a byte order mark at the start of the file. If this exists then it's a pretty good bet that the file is in that encoding - but it's not a dead certainty. You may well also find that the file is in one of those encodings, but doesn't have a byte order mark.

I don't know much about ISO-8859-2, but I wouldn't be surprised if almost every file is a valid text file in that encoding. The best you'll be able to do is check it heuristically. Indeed, the Wikipedia page talking about it would suggest that only byte 0x7f is invalid.

There's no idea of reading a file "as it is" and yet getting text out - a file is a sequence of bytes, so you have to apply a character encoding in order to decode those bytes into characters.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

梅梅子MMZ

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java 文本文件编码,Java文本文件编码

I have a text file and it can be ANSI (with ISO-8859-2 charset), UTF-8, UCS-2 Big or Little Endian.Is there any way to detect the encoding of the file to read it properly?Or is it possible to read a f...
复制链接

扫一扫