（IO）编码表和乱码

最新推荐文章于 2022-04-21 10:18:25 发布

schy_hqh

最新推荐文章于 2022-04-21 10:18:25 发布

阅读量128

点赞数

分类专栏： WEB后台@JavaSE 文章标签： java

本文链接：https://blog.csdn.net/schy_hqh/article/details/84499841

版权

WEB后台@JavaSE 专栏收录该内容

74 篇文章 0 订阅

订阅专栏

ASCII 美国标准信息交换码

用1个字节的7位来表示，最高位未使用（为0）

ISO8859-1 拉丁码表/欧洲码表

用1个字节的8位表示

GB2312 中国的码表

GBK 升级后的码表，加入了更多的中文字符

Unicode 国际标准编码表，表示各个国家的文字

每个字符(英文字符，汉字等)都用固定的2个字节来表示

UTF-8 1个中文，用3个字节表示

对Unicode的改进，能用1个字节存的就用1个字节存，需要2个字节存的就用2个字节。。。

操作字符串时，如果没有指定编码格式，则采用平台相关的编码进行处理。

FileWriter使用的是平台默认的编码表

如果要指定具体的编码，只能用转化流：InputStreamReader OutputStreamWriter

由于能够解析中文编码表不唯一，GBK、UTF-8都对中文进行了收录

编码、解码时如果使用的码表不一致，就会导致字符乱码的问题

使用指定码表进行编码、解码

package com.gc.encode;

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class CharsetDemo {
	public static void main(String[] args) throws UnsupportedEncodingException {
		
		String str = "你好";
		
		useDefaultCharset(str);//[-28, -67, -96, -27, -91, -67]
		useCharset(str,"GBK");//[-60, -29, -70, -61]
		useCharset(str,"GB2312");//[-60, -29, -70, -61]
		useCharset(str,"GB18030");//[-60, -29, -70, -61]
		useCharset(str,"UTF-8");//[-28, -67, -96, -27, -91, -67]
		
	}
	
	/**
	 * 平台默认编码表进行处理
	 */
	private static void useDefaultCharset(String str) {
		byte[] bytes = str.getBytes();
		System.out.println(Arrays.toString(bytes));
		
		String decodeStr = new String(bytes);
		System.out.println(decodeStr);
	}
	
	/**
	 * 指定编码表与解码表
	 */
	private static void useCharset(String str, String charset) throws UnsupportedEncodingException {
		//使用指定码表进行编码
		byte[] bytes = str.getBytes(charset);
		System.out.println(Arrays.toString(bytes));
		
		//使用平台默认码表解码，可能产生乱码
		String decodeStr = new String(bytes);
		System.out.println(decodeStr);//当以GBK/GB2312/GB18030编码，而用UTF-8解码时，就会产生乱码
		
		//使用与编码时相同的码表进行解码---不会产生乱码
		String decodeStr2 = new String(bytes,charset);
		System.out.println(decodeStr2);//你好
	}
}

解码时使用的码表错了，可以还原数据的情况

package com.gc.encode;

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class CharsetDemo {
	
	/**
	 *  解码时用ISO8859-1码表，可以还原字节数组 
	 */
	public static void main(String[] args) throws UnsupportedEncodingException {
		
		String str = "你好";
		
		//使用GBK编码
		byte[] bytes = str.getBytes("GBK");
		
		
		/**服务器会自动使用默认的ISO8859-1对字符串进行解码-----出现乱码*/
		/**为什么服务器用ISO8859-1进行解码：因为它可以对美国和欧洲的字符进行解析*/
		/**ISO8859-1对中文不支持，所以产生乱码*/
		String temp = new String(bytes,"ISO8859-1");
		System.out.println("GBK-->ISO8859-1:"+temp);//ÄãºÃ
		
		/**解决：还原字节数组，使用GBK表进行解码，得到正确的字符*/
		//将乱码还原为字节数组
		byte[] recoverBytes = temp.getBytes("ISO8859-1");
		//使用GBK对字节数组重新进行解码
		String recoverStr = new String(recoverBytes,"GBK");
		System.out.println("乱码-->ISO8859-1-->GBK:"+recoverStr);//你好
	}
	
}

解码时使用的码表错了，无法还原数据的情况

package com.gc.encode;

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class CharsetDemo {
	
	/**
	 *  使用GBK编码，用UTF-8解码---乱码
	 *  无法将乱码转换为初始的字节数组
	 *  需要杜绝这种情况的发生！！！
	 */
	public static void main(String[] args) throws UnsupportedEncodingException {
		
		String str = "你好";
		
		/**GBK编码*/
		byte[] bytes = str.getBytes("GBK");
		System.out.println(Arrays.toString(bytes));//[-60, -29, -70, -61]
		
		
		/**使用UTF-8对GBK进行解码---乱码---无法还原*/
		String temp = new String(bytes,"UTF-8");
		//-17, -65, -67为1组，出现了3次，数据无法还原了
		System.out.println(Arrays.toString(temp.getBytes()));//[-17, -65, -67, -17, -65, -67, -17, -65, -67]
		System.out.println("GBK-->UTF-8:"+temp);//GBK-->UTF-8:���
		
		/**由于数据的丢失，乱码将无法无法还原到使用UTF-8编码前的字节数组*/
		byte[] recoverBytes = temp.getBytes("UTF-8");
		String recoverStr = new String(recoverBytes,"GBK");
		System.out.println("乱码-->UTF-8-->GBK:"+recoverStr);//乱码-->UTF-8-->GBK:锟斤拷锟�
	}
	
}

schy_hqh

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
（IO）编码表和乱码

ASCII 美国标准信息交换码用1个字节的7位来表示，最高位未使用（为0） ISO8859-1 拉丁码表/欧洲码表用1个字节的8位表示 GB2312 中国的码表 GBK 升级后的码表，加入了更多的中文字符 Unicode 国际标准编码表，表示各个国家的文字每个字符(英文字符，汉字等)都用固定的2个字节来表示 UTF-8 1个中文...
复制链接

扫一扫