java中的编码与解码

最新推荐文章于 2024-07-11 07:26:52 发布

风絮_

最新推荐文章于 2024-07-11 07:26:52 发布

阅读量1k

点赞数

分类专栏：基础知识文章标签： java 编码解码

本文链接：https://blog.csdn.net/li3455277925/article/details/104348851

版权

基础知识专栏收录该内容

19 篇文章 0 订阅

订阅专栏

char

java的char类型占用两个字节，有三种赋值方式：

1、直接赋值

char c ='a';
char c1='中';
System.out.println(c); // a
System.out.println(c1); // 中

2、使用16进制或十进制赋值（这里的值是指编码表中对应的数值）

char c2= 0x8d24;
char c3 = 36132;
System.out.println(c2); // 贤
System.out.println(c3); // 贤

3、使用unicode码赋值

其中\表示转义，\u表示后面的数字为unicode码

char c4 = '\u8d24';
System.out.println(c4); // 贤

4、char是两个字节，只能表示utf-16中的基本面的字符，但是辅助面是由四个字节组成，所以只能用String表示

String c5 = "\ud842\udfb7";
System.out.println(c5); // 𠮷

String的编码和解码

编码

String的编码可以使用getBytes方法：

public static void main(String[] args) {
    String str = "芊雨";
    byte[] bytes = str.getBytes();
    for (int i = 0; i < bytes.length; i++) {
        System.out.printf("0x%x ", bytes[i]);
    } // 0xe8 0x8a 0x8a 0xe9 0x9b 0xa8 
}

可见一个汉字占三个字节，在IDEA中String的getBytes默认使用utf-8来编码

查看源码可知getBytes内部调用encode方法
encode
encode方法使用默认编码：
defaultCharset
默认编码从一个叫做file.encoding的属性中读取

所以我们修改了file.encoding属性，也就修改了编码
设置参数
此时一个汉字就占用两个字节了
运行结果

我们还可以直接在getBytes的参数中传入编码方式，来自定义编码

public static void main(String[] args) throws UnsupportedEncodingException {
    String str = "芊雨";
    byte[] b1 = str.getBytes("UTF-8");
    System.out.println(Arrays.toString(b1)); // [-24, -118, -118, -23, -101, -88]
    byte[] b2 = str.getBytes("GBK");
    System.out.println(Arrays.toString(b2)); // [-36, -73, -45, -22]
}

解码

String的解码可以使用构造函数

public static void main(String[] args) throws UnsupportedEncodingException {
    byte[] b1 = {-24, -118, -118, -23, -101, -88};
    byte[] b2 = {-36, -73, -45, -22};
    System.out.println(new String(b1)); // 芊雨
    System.out.println(new String(b1, "UTF-8")); // 芊雨
    System.out.println(new String(b2, "GBK")); // 芊雨
}

可逆编码

public static void main(String[] args) throws UnsupportedEncodingException {
    String str = "芊雨";
    byte[] b1 = str.getBytes("GBK");
    String str2 = new String(b1, "UTF-8");
    System.out.println(str2); // ܷ��
    String str3 = new String(b1, "GBK");
    System.out.println(str3); //芊雨
}

虽然上面使用GBK编码之后的字符串，使用UTF-8进行了解码。但是底层字节没有发生变化，所以可以还原成功

不可逆编码

public static void main(String[] args) throws UnsupportedEncodingException {
    String str = "芊雨";
    byte[] b1 = str.getBytes("ISO-8859-1");
    System.out.println(Arrays.toString(b1)); // [63, 63]
    String str2 = new String(b1, "ISO-8859-1");
    System.out.println(str2); // ??
}

上面虽然编码和解码都是使用的ISO-8859-1，但是由于ISO-8859-1不支持中文，所以在编码的时候使用?作为替换，这时候底层的字节已经发生变换，所以，解码之后只会出现?

解码问题

public static void main(String[] args) throws UnsupportedEncodingException {
    String str = "芊雨";
    byte[] b1 = str.getBytes("GBK");
    System.out.println(Arrays.toString(b1)); // [-36, -73, -45, -22]
    String str2 = new String(b1, "UTF-8");
    System.out.println(str2); // ܷ��
    byte[] b2 = str2.getBytes("UTF-8");
    System.out.println(Arrays.toString(b2)); // [-36, -73, -17, -65, -67, -17, -65, -67]
    String str3 = new String(b2, "GBK");
    System.out.println(str3); // 芊锟斤拷
}

上面的例子中GBK编码之后的字符串，使用UTF-8解码，UTF-8识别不了，于是就替换成了类似实心问号的字符。再进行UTF-8编码，实际上是对实心问号字符的编码，这个时候底层字节数组已经发生变化，自然会解码失败

public static void main(String[] args) throws UnsupportedEncodingException {
    String str = "芊雨";
    byte[] b1 = str.getBytes("GBK");
    System.out.println(Arrays.toString(b1)); // [-36, -73, -45, -22]
    String str2 = new String(b1, "ISO-8859-1");
    System.out.println(str2); // Ü·Óê
    byte[] b2 = str2.getBytes("ISO-8859-1");
    System.out.println(Arrays.toString(b2)); // [-36, -73, -45, -22]
    String str3 = new String(b2, "GBK");
    System.out.println(str3); // 芊雨
}

同样的方法，换成ISO-8859-1就不会出现上面的问题，是因为ISO-8859-1编码每一个字节多对应一个字符，不会出现有字符无法对应的情况，就是说底层字节数组没有发生变化

字符流编码与解码

字符流的编码与解码要注意的是：文件的编码和读取时使用的编码必须一致

// 文件：a.txt 编码：UTF-8 内容：芊雨
public static void main(String[] args) throws IOException {
    InputStreamReader in = new InputStreamReader(new FileInputStream("a.txt"), "UTF-8");
    int ch;
    while ((ch = in.read()) != -1) {
        System.out.print((char) ch);
    } // 芊雨
    in.close();
}

使用字符流复制图片

public static void main(String[] args) throws IOException {
    InputStreamReader in = new InputStreamReader(new FileInputStream("a.png"), "UTF-8");
    OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream("b.png"), "UTF-8");
    int ch;
    while ((ch = in.read()) != -1) {
        out.write(ch);
    }
    in.close();
    out.close();
}