java 中字符串string编码方式探究

最新推荐文章于 2023-04-05 21:10:29 发布

luffy_1993

最新推荐文章于 2023-04-05 21:10:29 发布

阅读量210

点赞数

分类专栏： java 文章标签： String

本文链接：https://blog.csdn.net/luffy_1993/article/details/82908846

版权

java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

在java中的String.getBytes(String charset),会先把字符串按字符分为字符数组，然后按单个字符编码。



import java.io.UnsupportedEncodingException;

public class CharsetTest {
    public static void main(String[] args) throws UnsupportedEncodingException {

        String s3 = "\u0061";
        String s4="\u6c49";
        System.out.println(s3);
        System.out.println(s4+"\n");

        System.out.println("test string.getChars(...):");
        String s = "你好lkf&*";
        printChars(s);
        System.out.println();

        System.out.println("test string.getBytes(charset):\n");
        String s1 = "汉";
        String s2 = "a";
        //文件本身编码方式为utf-8
        System.out.println("\""+s1+"\""+"的编码结果:");
        printEncoding(s1,null);
        System.out.println("-------------------------");
        System.out.println("\""+s2+"\""+"的编码结果:");
        printEncoding(s2,null);

        System.out.println("\nBOM:Byte order marker,0xfeff为big-endian,0xfffe为litter-endian");
    }

    public static void printEncoding(String s1,String [] encodings) {
        String[] encodes = encodings==null?new String[]{"utf-8","utf-16","utf-16le","utf-16be","iso-8859-1","us-ascii", "gbk", "gb2312","gb18030","unicode"}:encodings;
        for (String encode : encodes) {
            byte[] bytes = null;
            try {
                System.out.print(encode+":");
                bytes = s1.getBytes(encode);
                StringBuilder x = toHexString(bytes);
                System.out.println(x);
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            }
        }
    }

    public static void printChars(String s) {
        char[] chars = new char[s.length()];
        s.getChars(0,s.length(),chars,0);
        for (char aChar : chars) {
            System.out.println(aChar);
        }
    }

    public static StringBuilder toHexString(byte[] bytes) {
        StringBuilder b = new StringBuilder("0x(");
        for(int i=0; i < bytes.length; i++){
            b.append(Character.forDigit((bytes[i] >> 4) & 0xF, 16));
            b.append(Character.forDigit((bytes[i] & 0xF), 16));
            if (i < (bytes.length - 1)) {
                b.append(" ");
            }
        }
        b.append(")");
        return b;
    }
}

结果为：

a
汉

test string.getChars(...):
你
好
l
k
f
&
*

test string.getBytes(charset):

"汉"的编码结果:
utf-8:0x(e6 b1 89)
utf-16:0x(fe ff 6c 49)
utf-16le:0x(49 6c)
utf-16be:0x(6c 49)
iso-8859-1:0x(3f)
us-ascii:0x(3f) //0x3f表示？，表示无法编码
gbk:0x(ba ba)
gb2312:0x(ba ba)
gb18030:0x(ba ba)
unicode:0x(fe ff 6c 49） //0xfeff为big-endian BOM
-------------------------
"a"的编码结果:
utf-8:0x(61)
utf-16:0x(fe ff 00 61)
utf-16le:0x(61 00)
utf-16be:0x(00 61)
iso-8859-1:0x(61)
us-ascii:0x(61)
gbk:0x(61)
gb2312:0x(61)
gb18030:0x(61)
unicode:0x(fe ff 00 61)

BOM:Byte order marker,0xfeff为big-endian,0xfffe为litter-endian

上面为自己写的测试代码。

此处为转载：谈谈Unicode编码，简要解释UCS、UTF、BMP、BOM等名词http://www.fmddlmyy.cn/text6.html

luffy_1993

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java 中字符串string编码方式探究

在java中的String.getBytes(String charset),会先把字符串按字符分为字符数组，然后按单个字符编码。import java.io.UnsupportedEncodingException;public class CharsetTest { public static void main(String[] args) throws Unsu...
复制链接

扫一扫