Java的IO_04字符集和编码解码

最新推荐文章于 2022-10-11 16:12:46 发布

Re_view

最新推荐文章于 2022-10-11 16:12:46 发布

阅读量117

点赞数

分类专栏： Java的IO 文章标签：字符集编码解码

本文链接：https://blog.csdn.net/Re_view/article/details/97555960

版权

Java的IO 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

字符集

定义：Java字符使用16位的双字节存储，但是在实际文件存储的数据有各
种字符集，需要正确操作，否则就有乱码的发生。

字符集	说明
US-ASCII	即英文的ASCII
ISO-8859-1	Latin-1 拉丁字符，包含中文、日文等
UTF-8	变长unicode字符(1-3个字节)，国际通用
UTF-16BE	定长unicode字符(2个字节)，大端Big-endian表示高字节低地址 0x12
UTF-16LE	定长unicode字符(2个字节)，小端little-endian表示低字节低地址 0x78
UTF-16	文件中开头指定大端还是小端表示方式，即BOM(ByteOrder-Mark) ：FE FF 表示大端, FF FE 表示小端.

补充：
GBK
作用：它是GB2312的扩展，加入对繁体字的支持，兼容GB2312。
位数：使用2个字节表示，可表示21886个字符。
范围：高字节从81到FE，低字节从40到FE。
字节——>字符（解码）
字符——>字节（编码）

编码与解码

编码

package com.io.cx;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

/**
 * 编码: 字符串-->字节数组
 *
 */
public class encode {

	public static void main(String[] args) throws IOException {
		String msg ="你好a";
		//编码: 字节数组
		byte[] datas = msg.getBytes();  //默认使用工程的字符集
		System.out.println(datas.length);//UTF-8一个中文占3个，GBK一个中文占2个
		
		//编码: 其他字符集
		datas = msg.getBytes("UTF-16LE");//每个是两个字节
		System.out.println(datas.length);
		
		datas = msg.getBytes("GBK");//中文两个，英文一个
		System.out.println(datas.length);	
		datas = msg.getBytes("UTF-8");//中文三个，英文一个
		System.out.println(datas.length);	
		
	}

}

解码

package com.io.cx;

import java.io.UnsupportedEncodingException;

/**
 * 解码: 字节->字符串
 */
public class decode {

	public static void main(String[] args) throws UnsupportedEncodingException {
		String msg ="你好a";
		//编码: 字节数组
		byte[] datas = msg.getBytes();  //默认使用工程的字符集
		
		//解码: 字符串 String(byte[] bytes, int offset, int length, String charsetName)
		msg = new String(datas,0,datas.length,"utf8");
		System.out.println(msg);
		
		
		//乱码: 
		//1)、字节数不够
		msg = new String(datas,0,datas.length-2,"utf8");
		System.out.println(msg);
		msg = new String(datas,0,datas.length-1,"utf8");
		System.out.println(msg);
		
		//2)、字符集不统一
		msg = new String(datas,0,datas.length-1,"utf8");
		System.out.println(msg);
		
	}

}