Java编码学习

最新推荐文章于 2022-08-13 18:43:33 发布

azhning

最新推荐文章于 2022-08-13 18:43:33 发布

阅读量645

点赞数

分类专栏： Java 文章标签： java 编码 unicode

本文链接：https://blog.csdn.net/u012396308/article/details/47024569

版权

Java 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

<span style="font-family: 'Microsoft YaHei'; background-color: rgb(255, 255, 255);"><span style="font-size:14px;">参考文章：http://www.ibm.com/developerworks/cn/java/j-lo-chinesecoding/</span></span>

前提：

Java使用Unicode字符集编码，即utf-16。
Java中一个char是两个字节
编码：从字符转换为字节，从char[ ]到byte[ ]
解码：从字节转换为字符，从byte[ ]到char[ ]
Java中进行编码和解码时使用的是Charset类，该类中CharsetEncoder的Encode方法实现编码，decode实现解码
Java中需要编码解码的情况：
- I/O操作
- String到内存
- Java web（有待进一步学习）

String到内存——重点：getBytes( )方法

import java.io.IOException;
import java.io.StringReader;

public class EncodeCompare {

	public static void main(String[] args) throws IOException {
		char a = '中';
		System.out.println("this is from char:");
		System.out.println((int)a);
		System.out.println(Integer.toHexString(a));
		
		StringReader strReader = new StringReader("中");
		int b = strReader.read();
		System.out.println("this is from reader:");
		System.out.println(b);
		System.out.println(Integer.toHexString(b));
		
		String c = "中";
		byte[] result = c.getBytes("gb2312");
		System.out.println("this is from string:");
		for(int i = 0; i < result.length; i ++)
			System.out.println(result[i]);
		
	}

}

实际输出：

this is from char:
20013
4e2d
this is from reader:
20013
4e2d
this is from string:
-42
-48

最后字符串获取字节数组的过程在内存中经历了哪几个阶段呢？

在内存中加载程序：此阶段，字符按照Java的编码方式，在内存中表示为0x4e2d，占用了两个字节
编码转换：查找gb2312里，char到byte的码表，得到与0x4e2d对应的字节，结果为0xd6d0
将得到的结果赋给byte数组

程序的第一段，验证了char字符是按照utf-16的编码进行编码，因为对字符串按“utf-16”获得字节数组，得到的结果是0x4e2d；

程序的第二段，是为了检验，字符流中的read方法返回的int值是怎么得来的，通过对比结果可以得出，int值是该字符按utf-16得到的字节的十进制值。

azhning

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录