判断从输入流中获取的字符串是什么编码（UTF-8环境）

最新推荐文章于 2024-07-03 02:34:55 发布

wolfshadow.cn

最新推荐文章于 2024-07-03 02:34:55 发布

阅读量1.9w

点赞数 1

分类专栏：常见问题解决编程分享 Java基础文章标签： UTF-8 Unicode 乱码编码

本文链接：https://blog.csdn.net/u010188178/article/details/84637576

版权

Java基础同时被 3 个专栏收录

19 篇文章 2 订阅

订阅专栏

编程分享

13 篇文章 0 订阅

订阅专栏

常见问题解决

12 篇文章 0 订阅

订阅专栏

当你从一个未知编码的文件中，通过输入流读取内容时，假如是乱码怎么办？

如果你不知道字符串的编码，可能你只能靠尝试常用的编码的方式，将字符串处理成正确编码格式。

举个例子：“#鍑借喘鍚岃櫣娆惧紡f” 这是从某文件中读取的一行信息，怎么处理，一个一个去尝试么？不妨这样思考，用程序来帮忙判断其编码格式，同时将之转换成UTF-8编码怎么样。

特别说明：

（1）以下代码仅适用于UTF-8的编译环境下，即Java文件使用UTF-8编码

（2）代码中仅列举了常见的几种编码格式，感兴趣的话请自行增加其他编码

（3）将一些编码格式转换成UTF-8不能成功，暂不知怎么解决，希望懂的大神不吝赐教，在此非常感谢

1、首先写一个枚举类

/**
 * Unicode编码枚举类
 * 特别注意： 仅适合编码格式为UTF－8的编译系统中
 * @author WolfShadow
 * @date 2018年11月28日
 */
public enum UnicodeEnum {
	
	UTF_8("UTF-8",(byte)35 , (byte)-27 , (byte)-121),
	UTF_16("UTF-16",(byte)-30 , (byte)-113 , (byte)-91),
	GBK("GBK",(byte)35 , (byte)-23 , (byte)-115),
	GB2312("GB2312",(byte)35 , (byte)-17 , (byte)-65),
	ISO_8859_1("ISO-8859-1",(byte)35 , (byte)-61 , (byte)-91),
	
	NULL("未知编码",(byte)-1 , (byte)-1 , (byte)-1);
	
	private String encoding;//编码
	private byte byte1;//第1个字节
	private byte byte2;//第2个字节
	private byte byte3;//第3个字节
	
	private UnicodeEnum(String encoding, byte byte1,byte byte2, byte byte3) {
		this.encoding = encoding;
		this.byte1 = byte1;
		this.byte2 = byte2;
		this.byte3 = byte3;
	}
	
	public static UnicodeEnum getUnicodeEnum(byte byte1,byte byte2, byte byte3){
		UnicodeEnum[] values = UnicodeEnum.values();
		for(UnicodeEnum enum1 : values){
			if (enum1.getByte1()==byte1 && enum1.getByte2()==byte2 && enum1.getByte3()==byte3) {
				return enum1;
			}
		}
		return NULL;
	}

	public String getEncoding() {
		return encoding;
	}

	public void setEncoding(String encoding) {
		this.encoding = encoding;
	}

	public byte getByte1() {
		return byte1;
	}

	public void setByte1(byte byte1) {
		this.byte1 = byte1;
	}

	public byte getByte2() {
		return byte2;
	}

	public void setByte2(byte byte2) {
		this.byte2 = byte2;
	}

	public byte getByte3() {
		return byte3;
	}

	public void setByte3(byte byte3) {
		this.byte3 = byte3;
	}
}

2、然后增加一个工具类

/**
 * 字符串编码工具类
 * （1）检测字符串编码
 * （2）各种编码之间的转换（请自行完善）
 * （3）UTF－8、UTF-16、GBK、GB2312、ISO-8859-1等
 * @author WolfShadow
 * @date 2018年11月28日
 */
public class UnicodeUtil {
	
	 /**
	 * 返回字符串的编码格式
	 * @param str
	 * @return
	 * @auther WolfShadow
	 * @date 2018年11月28日
	 */
	public static String getUnicode(String str){
		if (StringUtil.isEmpty(str)) {
			return null;
		}
		byte[] bytes = str.getBytes();
		UnicodeEnum unicodeEnum = UnicodeEnum.getUnicodeEnum(bytes[0], bytes[1], bytes[2]);
		if (unicodeEnum == null) {
			return null;
		}
		return unicodeEnum.getEncoding();
	}
	
	 /**
	 * 将字符串转换成UTF－8格式
	 * @param str
	 * @return
	 * @throws UnsupportedEncodingException 
	 * @auther WolfShadow
	 * @date 2018年11月28日
	 */
	public static String getUTF_8(String str) throws UnsupportedEncodingException{
		String unicode = getUnicode(str);
		if (unicode == null || unicode.equals(UnicodeEnum.NULL.getEncoding())) {
			return null;
		}
		return new String(str.getBytes(unicode),UnicodeEnum.UTF_8.getEncoding());
	}
}

3、写一个测试方法（或新建一个测试类）

main方法为：

public static void main(String[] args) throws UnsupportedEncodingException {
		String test = "#函购同虹款式f"; 
		String str1 = new String(test.getBytes(),"UTF-8");
		String str2 = new String(test.getBytes(),"GBK");
		String str3 = new String(test.getBytes(),"ISO-8859-1");
		String str4 = new String(test.getBytes(),"UTF-16");
		String str5 = new String(test.getBytes(),"GB2312");
		String str6 = new String(test.getBytes(),"Unicode");
		
		System.out.println(getUnicode(str1));
		System.out.println(getUnicode(str2));
		System.out.println(getUnicode(str3));
		System.out.println(getUnicode(str4));
		System.out.println(getUnicode(str5));
		System.out.println(getUnicode(str6));
		
		System.out.println(getUTF_8(str6));
		System.out.println(getUTF_8(str5));
		System.out.println(getUTF_8(str4));
		System.out.println(getUTF_8(str3));
		System.out.println(getUTF_8(str2));
		System.out.println(getUTF_8(str1));
	}

4、输出结果