测试过程中遇到字符集问题小结

<p>作者:赵璨</p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style=""><span style=""></span></span><span style="">这两天我们在用</span><span style="" lang="EN-US">dbunit</span><span style="">插</span><span style="" lang="EN-US">IM</span><span style="">数据库时碰到了乱码问题,初步做了研究,分享一下,可能对其他测试同学也会有点帮助。有错误的地方大家尽管指出。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><strong><span style="">一:理论</span></strong><strong></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="" lang="EN-US"> </span><span style="">编程过程中涉及到的字符集编码问题通常涉及到这样几个因素:</span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">1.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="">文件编码格式</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">例如</span><span style="" lang="EN-US">XML</span><span style="">文件通过第一行的声明指定了文件的编码格式。</span><span style="" lang="EN-US">Java</span><span style="">中可以通过</span><span style="" lang="EN-US">InputStream</span><span style="">读取文件,用</span><span style="" lang="EN-US">InputStreamReader</span><span style="">来对读取后的字节解码。这些在陈教兽的</span><span style="" lang="EN-US">blog</span><span style="">中也有(</span><span style="color: #9d9d9d;" lang="EN-US"><a title="阿里旺旺无法确定该链接的安全性" href="http://blog.csdn.net/linkyou/archive/2009/03/14/3990547.aspx" target="_blank"><span style="color: gray;"><span style="font-family: Times New Roman;">http://blog.csdn.net/linkyou/archive/2009/03/14/3990547.aspx</span></span></a></span><span style="">),例如:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="color: black;" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">InputStream is = new FileInputStream("z.xml"); <br>InputStreamReader streamReader = new InputStreamReader(is, "GBK");</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">当然也有用其他方式解码的,例如</span><span style="color: black;" lang="EN-US"><span style="font-family: Times New Roman;">Dom4j</span></span><span style="">等等。</span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">2.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="" lang="EN-US">JVM</span><span style="">编码格式</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">默认的</span><span style="" lang="EN-US">Java</span><span style="">都是使用</span><span style="" lang="EN-US">Unicode</span><span style="">字符集已做到更好的国际化。因此当需要解析其他字符集的时候,我们需要进行相应的解码。例如:</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0"><tbody><tr style="">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 426.1pt; padding-top: 0cm; background-color: transparent;" width="568" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> String <span style="background: silver;">s</span> = </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"abc"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">;</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> </span><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">byte</span></strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">[] b = <span style="background: silver;">s</span>.getBytes(</span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"UTF-8"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">);</span></p>
</td>
</tr></tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="" lang="EN-US"><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">又或者有另一种方式:</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0"><tbody><tr style="">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 426.1pt; padding-top: 0cm; background-color: transparent;" width="568" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">String sql = </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"abc";</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">String s2 = </span><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">new</span></strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> String(sql.getBytes(</span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"GBK"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">), </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"ISO8859-1"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">);</span></p>
</td>
</tr></tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">这个如何解释?不能直接理解为将</span><span style="" lang="EN-US">GBK</span><span style="">转为</span><span style="" lang="EN-US">ISO8859-1</span><span style="">。而是应该理解为字符串</span><span style="" lang="EN-US">sql</span><span style="">用</span><span style="" lang="EN-US">GBK</span><span style="">解码还原为原始字节数组,再用</span><span style="" lang="EN-US">ISO8859-1</span><span style="">解码成正确的字符串。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="" lang="EN-US"><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">3.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="">数据库编码格式</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">这个又复杂了一点,就谈</span><span style="" lang="EN-US">Oracle</span><span style="">的字符集,分成服务端和客户端。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">服务端上,</span><span style="" lang="EN-US">us7ascii</span><span style="">是</span><span style="" lang="EN-US">Oracle</span><span style="">最早支持的编码方案,是单字节编码的。现在为了支持国际化,一般都采用</span><span style="" lang="EN-US">unicode</span><span style="">编码了。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 42pt;"><span style="font-size: small;"><span style="">客户端上,是通过</span><span style="" lang="EN-US">NLS_LANG</span><span style="">指定编码方式的,任何发自或发往客户端的数据都是用客户端定义的字符集编码。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><strong><span style="">二:实战</span></strong><strong></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="" lang="EN-US"> </span><span style="">好,现在回到我们的问题:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="" lang="EN-US">1. </span><span style="">为什么数据库编码为</span><span style="" lang="EN-US">us7ascii</span><span style="">时,我们用</span><span style="" lang="EN-US">dbunit</span><span style="">插数据会出现乱码?</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">因为:</span><span style="" lang="EN-US">1. </span><span style="">我们的</span><span style="" lang="EN-US">xml</span><span style="">数据文件编码是</span><span style="" lang="EN-US">UTF-8</span><span style="">:</span><span style="" lang="EN-US"> <?xml version="1.0" encoding="UTF-8"?></span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">2.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="" lang="EN-US">JVM</span><span style="">编码默认是</span><span style="" lang="EN-US">GBK</span><span style="">,可能大家不同的系统会略有区别,不过应该都是</span><span style="" lang="EN-US">Unicode</span><span style="">双字节编码:可以通过</span><span style="" lang="EN-US">Charset.defaultCharset()</span><span style="">查看默认编码</span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">3.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="" lang="EN-US">IM</span><span style="">数据库服务端编码是</span><span style="" lang="EN-US">us7ascii</span><span style="">,客户端也要用</span><span style="" lang="EN-US">us7ascii</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 25.5pt;"><span style="font-size: small;"><span style="">首先</span><span style="" lang="EN-US">dbunit</span><span style="">读取</span><span style="" lang="EN-US">utf-8</span><span style="">的文件,默认情况下却用双字节的</span><span style="" lang="EN-US">gbk</span><span style="">解码,然后插入到单字节编码规范的数据库中,肯定出现乱码。</span></span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-size: small;">2.</span><span style='font: 7pt "Times New Roman";'> </span></span></span><span style="font-size: small;"><span style="">如何解决?</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt;"><span style="font-size: small;"><span style="">最好是编码格式统一,当然数据库的编码格式我们不能改。我们能够改的是</span><span style="" lang="EN-US">xml</span><span style="">文件:</span><span style="" lang="EN-US"> <?xml version="1.0" encoding="GBK"?></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt;"><span style="font-size: small;"><span style="" lang="EN-US">Ok</span><span style="">,现在前两步统一编码了,现在要做的是在插入数据库之前在</span><span style="" lang="EN-US">java</span><span style="">代码中用对应</span><span style="" lang="EN-US">us7ascii</span><span style="">的编码格式做一次解码,对应的编码规范是“</span><span style="" lang="EN-US">ISO8859-1<span style="" lang="EN-US"><span lang="EN-US">”</span></span></span><span style="">。这里我们</span><span style="" lang="EN-US">Dbunit</span><span style="">的代码要做一下改进,陈洪看看,更新一下代码。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt;"><span style="font-size: small;"><span style="">具体代码如下:</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0"><tbody><tr style="">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 426.1pt; padding-top: 0cm; background-color: transparent;" width="568" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> DbUnit db = </span><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">new</span></strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> DbUnit(</span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"oracle.jdbc.driver.OracleDriver"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">,</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"jdbc:oracle:thin:@10.2.225.81:1521:asoft"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">, </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"aliim"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">,</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"aliim"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">);</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> db.setSchema(</span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"ALIIM"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">);</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> String path = </span><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"util//z.xml"</span><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">;</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> FileInputStream stream = </span><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">new</span></strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> FileInputStream(path);</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> //</span></strong><strong><span style="">用</span></strong><strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">InputStreamReader</span></strong><strong><span style="">的构造函数对读取的字节流做</span></strong><strong><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">ISO8859-1</span></strong><strong><span style="">的解码</span></strong><strong></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> InputStreamReader reader = </span></strong><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">new</span></strong><strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> InputStreamReader(stream, </span></strong><strong><span style='font-size: 10pt; color: #2a00ff; font-family: "Courier New";' lang="EN-US">"ISO8859-1"</span></strong><strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">);</span></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; font-family: "Courier New";' lang="EN-US"></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> FlatXmlDataSet dataSet = </span><strong><span style='font-size: 10pt; color: #7f0055; font-family: "Courier New";' lang="EN-US">new</span></strong><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> FlatXmlDataSet(reader);</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US"> DatabaseOperation.</span><em><span style='font-size: 10pt; color: #0000c0; font-family: "Courier New";' lang="EN-US">INSERT</span></em><span style='font-size: 10pt; color: black; font-family: "Courier New";' lang="EN-US">.execute(db.getConnection(), dataSet);</span></p>
</td>
</tr></tbody></table>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值