Java & 计算机编码

目录

一、Java & 计算机编码

1、Java的char字符

2、String类

3、Java中的编码表类型

4、byte转16进制字符串


一、Java & 计算机编码

1、Java的char字符

在Java内部进行字符处理时,采用的都是Unicode,具体编码格式是UTF-16BE。简单回顾一下,UTF-16使用两个或四个字节表示一个字符,Unicode编号范围在65536以内的占两个字节,超出范围的占四个字节,BE(Big Endian)就是先输出高位字节,再输出低位字节,这与整数的内存表示是一致的。

char本质上是一个固定占用两个字节的无符号正整数,这个正整数对应于Unicode编号,用于表示那个Unicode编号对应的字符。

由于固定占用两个字节,char只能表示Unicode编号在65536以内的字符,而不能表示超出范围的字符。

那超出范围的字符怎么表示呢?只能使用String类来表示,例如:汉字"𠮷"的Unicode码点为0x20BB7,该码点显然超出了65535,所以只能用String表示,而当粘贴到代码中时,自动转换为两个字符"\uD842\uDFB7"

public class CharTest {

	public static void main(String[] args) {
		char c = '贤';
		System.out.println(c);
		char c1 = 0x8d24;
		System.out.println(c1);
		char c2 = 36132;
		System.out.println(c2);
		char c3 = '\u8d24';
		System.out.println(c3);
//		char c4 = '\uD842\uDFB7';
		String s = "\uD842\uDFB7";
		System.out.println(s);
	}

}

2、String类

getBytes():此方法是根据java命令运行时参数file.encoding设置的编码表进行编码的。

查看getBytes()底层:

    public static Charset defaultCharset() {
        if (defaultCharset == null) {
            synchronized (Charset.class) {
                String csn = AccessController.doPrivileged(
                    new GetPropertyAction("file.encoding"));
                Charset cs = lookup(csn);
                if (cs != null)
                    defaultCharset = cs;
                else
                    defaultCharset = forName("UTF-8");
            }
        }
        return defaultCharset;
    }

例子:

import java.util.Arrays;

public class StringTest {

	public static void main(String[] args) {
		System.out.println(System.getProperty("file.encoding"));
		String str = "你好";
		byte[] bytes=str.getBytes();
		System.out.println(Arrays.toString(bytes));
	}

}
import java.util.Arrays;

public class StringTest {

	public static void main(String[] args) throws Exception {
		String str = "你好";
		byte[] bytes = str.getBytes("UTF-8");
		System.out.println(Arrays.toString(bytes));//[-28, -67, -96, -27, -91, -67]
		byte[] gbks = str.getBytes("GBK");
		System.out.println(Arrays.toString(gbks));//[-60, -29, -70, -61]
		
		byte[] bytes1 = {-28, -67, -96, -27, -91, -67};
		String str1 = new String(bytes1,"UTF-8");
		System.out.println(str1);//你好
		
		byte[] bytes2 = {-60, -29, -70, -61};
		String str2 = new String(bytes2,"GBK");
		System.out.println(str2);//你好
	}

}

乱码可逆演示

	public static void lmknCode() throws Exception {
		String str = "你好";
		byte[] bytes = str.getBytes("GBK");
		System.out.println(Arrays.toString(bytes));
		String str1 = new String(bytes,"UTF-8");
		System.out.println(str1);
		String str2 = new String(bytes,"GBK");
		System.out.println(str2);
	}

乱码不可逆演示

	public static void lmbknCode() throws Exception {
		String str = "你好";
		byte[] bytes = str.getBytes("ISO-8859-1");
		System.out.println(Arrays.toString(bytes));//[63, 63]
		String str1 = new String(bytes,"GBK");
		System.out.println(str1);//??
		String str2 = new String(bytes,"UTF-8");
		System.out.println(str2);//??
	}

3、Java中的编码表类型

import java.nio.charset.Charset;
import java.util.Set;

public class JavaCode {

	public static void main(String[] args) {
		Set<String> charsetNames = Charset.availableCharsets().keySet();
		System.out.println("-----JDK1.8 charset is "+charsetNames.size()+"----- ");
		for (String str : charsetNames) {
			System.out.println(str);
		}
	}
}

结果:

-----JDK1.8 charset is 170----- 
Big5
Big5-HKSCS
CESU-8
EUC-JP
EUC-KR
GB18030
GB2312
GBK
IBM-Thai
IBM00858
IBM01140
IBM01141
IBM01142
IBM01143
IBM01144
IBM01145
IBM01146
IBM01147
IBM01148
IBM01149
IBM037
IBM1026
IBM1047
IBM273
IBM277
IBM278
IBM280
IBM284
IBM285
IBM290
IBM297
IBM420
IBM424
IBM437
IBM500
IBM775
IBM850
IBM852
IBM855
IBM857
IBM860
IBM861
IBM862
IBM863
IBM864
IBM865
IBM866
IBM868
IBM869
IBM870
IBM871
IBM918
ISO-2022-CN
ISO-2022-JP
ISO-2022-JP-2
ISO-2022-KR
ISO-8859-1
ISO-8859-13
ISO-8859-15
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
JIS_X0201
JIS_X0212-1990
KOI8-R
KOI8-U
Shift_JIS
TIS-620
US-ASCII
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UTF-8
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
windows-31j
x-Big5-HKSCS-2001
x-Big5-Solaris
x-euc-jp-linux
x-EUC-TW
x-eucJP-Open
x-IBM1006
x-IBM1025
x-IBM1046
x-IBM1097
x-IBM1098
x-IBM1112
x-IBM1122
x-IBM1123
x-IBM1124
x-IBM1166
x-IBM1364
x-IBM1381
x-IBM1383
x-IBM300
x-IBM33722
x-IBM737
x-IBM833
x-IBM834
x-IBM856
x-IBM874
x-IBM875
x-IBM921
x-IBM922
x-IBM930
x-IBM933
x-IBM935
x-IBM937
x-IBM939
x-IBM942
x-IBM942C
x-IBM943
x-IBM943C
x-IBM948
x-IBM949
x-IBM949C
x-IBM950
x-IBM964
x-IBM970
x-ISCII91
x-ISO-2022-CN-CNS
x-ISO-2022-CN-GB
x-iso-8859-11
x-JIS0208
x-JISAutoDetect
x-Johab
x-MacArabic
x-MacCentralEurope
x-MacCroatian
x-MacCyrillic
x-MacDingbat
x-MacGreek
x-MacHebrew
x-MacIceland
x-MacRoman
x-MacRomania
x-MacSymbol
x-MacThai
x-MacTurkish
x-MacUkraine
x-MS932_0213
x-MS950-HKSCS
x-MS950-HKSCS-XP
x-mswin-936
x-PCK
x-SJIS_0213
x-UTF-16LE-BOM
X-UTF-32BE-BOM
X-UTF-32LE-BOM
x-windows-50220
x-windows-50221
x-windows-874
x-windows-949
x-windows-950
x-windows-iso2022jp

4、byte转16进制字符串

import ch.qos.logback.core.encoder.ByteArrayUtil;  //logback-core-1.2.10.jar中

public class ByteTest {
    public static void main(String[] args) {
        System.out.println(ByteArrayUtil.toHexString(new byte[10]));
    }
}

每天⽤⼼记录⼀点点。内容也许不重要,但习惯很重要!

计算机编码与解码&编码表

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

杀神lwz

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值