android编码的理解2

http://jiapumin.iteye.com/blog/1006144

http://blog.csdn.net/zhouwubin123/article/details/6672594


http://m.blog.csdn.net/blog/wlgj452555712/12129443


http://blog.csdn.net/jerry_bj/article/details/5714745

[原]android Jni jstring返回中文 乱码、崩溃问题

2013-9-28阅读1637 评论2

对于jni我的观点是繁琐的有点不知所措,连简单的中文传输都麻烦至极。查阅了各种资料,什么C文件格式保存为utf-8格式,什么把jstring换成jarraybyte,更甚至一些复制粘贴党,把一些什么windows下的把C文件代码转换成utf-8,再麻烦点的就是连iconv库都用上了,至于最终有没有真的把中文从jni c层转到了java层,乱说一遭然后就清一色的戛然而至了。为了让大家不走弯路,参考了某篇文章后,真在实现了从中文显示之后我才发的贴。

   言归正传。中文显示乱码的原因大家也都明白,java层跟c层使用了不同的编码方式,下面是直接的代码,直接实现中文转换。

java string是unicode编码,而c语言大部分是utf8编码


 在Java中,String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这个表示在不通OS下,返回的东西不一样!


String.getBytes(String decode)方法会根据指定的decode编码返回某字符串在该编码下的byte数组表示,如

byte[] b_gbk = "中".getBytes("GBK");
byte[] b_utf8 = "中".getBytes("UTF-8");
byte[] b_iso88591 = "中".getBytes("ISO8859-1");


将分别返回“中”这个汉字在GBK、UTF-8和ISO8859-1编码下的byte数组表示,此时b_gbk的长度为2,b_utf8的长度为3,b_iso88591的长度为1。

而与getBytes相对的,可以通过new String(byte[], decode)的方式来还原这个“中”字时,这个new String(byte[], decode)实际是使用decode指定的编码来将byte[]解析成字符串。

String s_gbk = new String(b_gbk,"GBK");
String s_utf8 = new String(b_utf8,"UTF-8");
String s_iso88591 = new String(b_iso88591,"ISO8859-1");

通过打印s_gbk、s_utf8和s_iso88591,会发现,s_gbk和s_utf8都是“中”,而只有s_iso88591是一个不认识的字符,为什么使用ISO8859-1编码再组合之后,无法还原“中”字呢,其实原因很简单,因为ISO8859-1编码的编码表中,根本就没有包含汉字字符,当然也就无法通过"中".getBytes("ISO8859-1");来得到正确的“中”字在ISO8859-1中的编码值了,所以再通过new String()来还原就无从谈起了。

因此,通过String.getBytes(String decode)方法来得到byte[]时,一定要确定decode的编码表中确实存在String表示的码值,这样得到的byte[]数组才能正确被还原。

有时候,为了让中文字符适应某些特殊要求(如http header头要求其内容必须为iso8859-1编码),可能会通过将中文字符按照字节方式来编码的情况,如

String s_iso88591 = new String("中".getBytes("UTF-8"),"ISO8859-1"),

这样得到的s_iso8859-1字符串实际是三个在 ISO8859-1中的字符,在将这些字符传递到目的地后,

目的地程序再通过相反的方式String s_utf8 = new String(s_iso88591.getBytes("ISO8859-1"),"UTF-8")来得到正确的中文汉字“中”。这样就既保证了遵守协议规定、也支持中文。


   private byte[] getBytesWithDefaultEncoding(String content) {

        System.out.println("\nEncode with default encoding\n");
        byte[] bytes = content.getBytes();
        return bytes;
    }

    private byte[] getBytesWithGivenEncoding(String content, String encoding) {
        System.out.println("\nEncode with given encoding : " + encoding + "\n");
        try {
            byte[] bytes = content.getBytes(encoding);
            return bytes;
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
            return null;
        }
    }

    private void printBytes(byte[] bytes) {
        for (int i = 0; i < bytes.length; i++) {
            System.out.print(" byte[" + i + "] = " + bytes[i]);
            System.out
                    .println(" hex string = " + Integer.toHexString(bytes[i]));
        }
    }

    private void printCharArray(String inStr) {
        char[] charArray = inStr.toCharArray();

        for (int i = 0; i < inStr.length(); i++) {
            byte b = (byte) charArray[i];
            short s = (short) charArray[i];
            String hexB = Integer.toHexString(b).toUpperCase();
            String hexS = Integer.toHexString(s).toUpperCase();
            StringBuffer sb = new StringBuffer();

            // print char
            sb.append("char[");
            sb.append(i);
            sb.append("]='");
            sb.append(charArray[i]);
            sb.append("'\t");

            // byte value
            sb.append("byte=");
            sb.append(b);
            sb.append(" \\u");
            sb.append(hexB);
            sb.append('\t');

            // short value
            sb.append("short=");
            sb.append(s);
            sb.append(" \\u");
            sb.append(hexS);
            sb.append('\t');

            // Unicode Block
            sb.append(Character.UnicodeBlock.of(charArray[i]));

            System.out.println(sb.toString());
        }
        System.out.println("\nCharacters length: " + charArray.length);
    }


    private void setPreviewWindowRect() throws Exception {

/*        String content = "中文";
        String defaultEncoding = System.getProperty("file.encoding");
        String defaultLnaguage = System.getProperty("user.language");
        System.out.println("System default encoding --- " + defaultEncoding);
        System.out.println("System default language --- " + defaultLnaguage);

        getCharWithDefaultEncoding(content);
        getCharWithGivenEncoding(content, "ISO-8859-1");
        getCharWithGivenEncoding(content, "GBK");
        getCharWithGivenEncoding(content, "UTF-8");
*/

/*        String content = "中文";
        String defaultEncoding = System.getProperty("file.encoding");
        String defaultLnaguage = System.getProperty("user.language");
        System.out.println("System default encoding --- " + defaultEncoding);
        System.out.println("System default language --- " + defaultLnaguage);

        byte[] defaultBytes = getBytesWithDefaultEncoding(content);
        printBytes(defaultBytes);

        byte[] iso8859Bytes = getBytesWithGivenEncoding(content,
                "ISO-8859-1");
        printBytes(iso8859Bytes);

        byte[] gbkBytes = getBytesWithGivenEncoding(content, "GBK");
        printBytes(gbkBytes);

        byte[] utfBytes = getBytesWithGivenEncoding(content, "UTF-8");
        printBytes(utfBytes);
*/
        String content = "中文";
        String defaultEncoding = System.getProperty("file.encoding");
        String defaultLnaguage = System.getProperty("user.language");
        System.out.println("System default encoding --- " + defaultEncoding);
        System.out.println("System default language --- " + defaultLnaguage);

printCharArray(content);

        byte[] defaultBytes = getBytesWithDefaultEncoding(content);
        printBytes(defaultBytes);

String encodedString1 = new String(defaultBytes, "UTF-8");
printCharArray(encodedString1);

        byte[] iso8859Bytes = getBytesWithGivenEncoding(content,
                "ISO-8859-1");
        printBytes(iso8859Bytes);

String encodedString2 = new String(iso8859Bytes, "ISO-8859-1");
printCharArray(encodedString2);

        byte[] gbkBytes = getBytesWithGivenEncoding(content, "GBK");
        printBytes(gbkBytes);
String encodedString3 = new String(gbkBytes, "GBK");
printCharArray(encodedString3);

        byte[] utfBytes = getBytesWithGivenEncoding(content, "UTF-8");
        printBytes(utfBytes);
String encodedString4 = new String(utfBytes, "UTF-8");
printCharArray(encodedString4);

}


结果是:

root@ardbeg:/ # am stack setpreviewrect 0 500 500 1100 900                     
System default encoding --- UTF-8
System default language --- zh
char[0]='中'    byte=45 \u2D    short=20013 \u4E2D    CJK_UNIFIED_IDEOGRAPHS
char[1]='文'    byte=-121 \uFFFFFF87    short=25991 \u6587    CJK_UNIFIED_IDEOGRAPHS

Characters length: 2

Encode with default encoding

 byte[0] = -28 hex string = ffffffe4
 byte[1] = -72 hex string = ffffffb8
 byte[2] = -83 hex string = ffffffad
 byte[3] = -26 hex string = ffffffe6
 byte[4] = -106 hex string = ffffff96
 byte[5] = -121 hex string = ffffff87
char[0]='中'    byte=45 \u2D    short=20013 \u4E2D    CJK_UNIFIED_IDEOGRAPHS
char[1]='文'    byte=-121 \uFFFFFF87    short=25991 \u6587    CJK_UNIFIED_IDEOGRAPHS

Characters length: 2

Encode with given encoding : ISO-8859-1

 byte[0] = 63 hex string = 3f
 byte[1] = 63 hex string = 3f
char[0]='?'    byte=63 \u3F    short=63 \u3F    BASIC_LATIN
char[1]='?'    byte=63 \u3F    short=63 \u3F    BASIC_LATIN

Characters length: 2

Encode with given encoding : GBK

 byte[0] = -42 hex string = ffffffd6
 byte[1] = -48 hex string = ffffffd0
 byte[2] = -50 hex string = ffffffce
 byte[3] = -60 hex string = ffffffc4
char[0]='中'    byte=45 \u2D    short=20013 \u4E2D    CJK_UNIFIED_IDEOGRAPHS
char[1]='文'    byte=-121 \uFFFFFF87    short=25991 \u6587    CJK_UNIFIED_IDEOGRAPHS

Characters length: 2

Encode with given encoding : UTF-8

 byte[0] = -28 hex string = ffffffe4
 byte[1] = -72 hex string = ffffffb8
 byte[2] = -83 hex string = ffffffad
 byte[3] = -26 hex string = ffffffe6
 byte[4] = -106 hex string = ffffff96
 byte[5] = -121 hex string = ffffff87
char[0]='中'    byte=45 \u2D    short=20013 \u4E2D    CJK_UNIFIED_IDEOGRAPHS
char[1]='文'    byte=-121 \uFFFFFF87    short=25991 \u6587    CJK_UNIFIED_IDEOGRAPHS

Characters length: 2


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值