J2ME经验总结之GB2312转换类Unicode方式

最新推荐文章于 2023-02-04 11:38:36 发布

hunhun1981

最新推荐文章于 2023-02-04 11:38:36 发布

阅读量1.9k

点赞数

分类专栏： J2ME 文章标签： j2me byte exception nokia string java

本文链接：https://blog.csdn.net/hunhun1981/article/details/1845576

版权

J2ME 专栏收录该内容

44 篇文章 0 订阅

订阅专栏

作者：hunhun1981
出自：http://blog.csdn.net/hunhun1981/

之前的文章介绍了在j2me环境下GB2312转换为UTF-8的方法。
http://blog.csdn.net/hunhun1981/archive/2007/10/15/1825472.aspx

在后来继续对编码及char类型进行学习，发现一些有趣的问题。
首先java环境下的char类型变量，实际上就是以unicode方式存储的。
所以以下方法有效：

输入unicode编码的byte数组，即可两两拼接成一个char。
而String类型实际上就是在char数组的基础上衍生出来的。大家可以参考cldc的源代码。

     public static String read_Uni( byte [] word_unicode) {
        StringBuffer stringbuffer = new StringBuffer( "" );
         for ( int j = 0 ; j < word_unicode.length;) {
             int l = word_unicode[j ++ ];
             int h = word_unicode[j ++ ];
             char c = ( char ) ((l & 0xff ) | ((h << 8 ) & 0xff00 ));
            stringbuffer.append(c);
        }
         return stringbuffer.toString();
    }

j2me环境下也是如此。
所以在第一次给出的转换类中，提供的gb2312到utf-8直接转换的快速方法。现在看来是画蛇添足了。

根据以上经验，更新转换类如下：

     public class HGB2312 {

         private byte [] map = new byte [ 15228 ];

         public HGB2312() throws Exception {
            InputStream is = getClass().getResourceAsStream( " /gb2u.dat " );
            is.read(map);
            is.close();
        }

         public String gb2utf8( byte [] gb) {
            StringBuffer sb = new StringBuffer();
             int c, h, l, ind;
             for ( int i = 0 ; i < gb.length;) {
                 if (gb[i] >= 0 ) {
                    sb.append(( char ) gb[i ++ ]);
                } else {
                    h = 256 + gb[i ++ ];
                    l = 256 + gb[i ++ ];
                    h = h - 0xA0 - 1 ;
                    l = l - 0xA0 - 1 ;
                     if (h < 9 ) {
                        ind = (h * 94 + l) << 1 ;
                        c = (byte2Int(map[ind]) << 8 | byte2Int(map[ind + 1 ]));
                        sb.append(( char ) c);
                    } else if (h >= 9 && h <= 14 ) {
                        sb.append(( char ) 0 );
                    } else if (h > 14 ) {
                        h -= 6 ;
                        ind = (h * 94 + l) << 1 ;
                        c = (byte2Int(map[ind]) << 8 | byte2Int(map[ind + 1 ]));
                        sb.append(( char ) c);

                    } else {
                        sb.append(( char ) 0 );
                    }
                }
            }
             return sb.toString();
        }

         private int byte2Int( byte b) {
             if (b < 0 ) {
                 return 256 + b;
            } else {
                 return b;
            }
        }
    }

这个方法明显要比第一次快很多了，直接查表，然后拼接成String，不需要转换成utf-8编码。

数据文件请在http://download.csdn.net/source/263609获取

总之，java中的char类型实际上是存储了unicode编码。目前在nokia 5300上测试通过。
我觉得其它机器也应该是这样。如果哪位大侠知道这方面的资料，请赐教。

更多信息，请关注hunhun1981的专栏。