WEB应用中乱码问题小结(原创)
===============================================
摘要:
WEB应用中出现页面乱码,其实很简单,问题无外乎出在以下几方面:
文件本身编码
程序转码
数据库编码
数据库内容
WEB容器指定编码
WEB应用指定编码
网络传输入转码
其他
//
[常见编码]iso-8859-1,gbk, gb2312,big5,unicode,utf-8,utf-16等
一、文件本身编码
===================
Editplus "另存为"时,有默认/Unicode/Utf-8
Eclipse 可以设置IDE中某工程编码
二、程序转码
===================
JAVA:
String str = new String("...".getBytes("iso-885-1"),"GBK");
...
System.out.println(java.net.URLEncoder.encode("This string has spaces","UTF-8"));
System.out.println(java.net.URLDecoder.decode(input, "UTF-8"));
//http://www.java3z.com/cwbwebhome/article/article2/2414.html?id=1101
JavaScript:
<script language="javascript">
alert(str=encodeURI("你好"))
alert(decodeURI(str))
</script>
//Escape/Unescape
//Encoding/Decoding
//http://scriptasylum.com/tutorials/encdec/encode-decode.html
//http://www.xunlu.net/small-technique/Encoder-Decoder-html.htm
三、数据库编码
===================
要与应用程序的设置一致
保证入库的数据不是乱码
四、数据库内容
===================
保证入库的数据不是乱码
程序输入的/导库出现的
五、WEB容器指定编码
===================
[***********************以JAVA tomcat6为例************************]
conf/server.xml
...
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />
...
ROOT/WEB-INF/web.xml
...
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>utf-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<!--2009-2-27 有一次apache httpd.conf中设置了编码,web.xml中有如上设置,server.xml也有设置,但在tomcat中引
<!--而用.jsp没乱码,加了如下代码就好了,应用程序是放在WEB-INF/下面。原因应在此,因为外部直接访问不了此目录下 的信息,以上设置如同未设置--->
<jsp-config>
<jsp-property-group>
<description>
Special property group for JSP Configuration JSP
example.
</description>
<display-name>JSPConfiguration</display-name>
<url-pattern>*.html</url-pattern>
<el-ignored>true</el-ignored>
<page-encoding>UTF-8</page-encoding>
<scripting-invalid>false</scripting-invalid>
<include-prelude></include-prelude>
<include-coda></include-coda>
</jsp-property-group>
</jsp-config>
<welcome-file-list>
<welcome-file>index.jsp</welcome-file>
</welcome-file-list>
...
<!--另一个应用,只在WEB-INF/web.xml中加了如下代码,tomcat没有管,apache中也没有管,不出现任何问题 start--->
<!-- 设置Spring对Web开发支持过滤器,对请求参数编码 -->
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>
org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
...
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>*.do</url-pattern>
</filter-mapping>
<!--另一个应用,只在WEB-INF/web.xml中加了如下代码,tomcat没有管,apache中也没有管,不出现任何问题 end--->
[***********************以apache22为例************************]
AddDefaultCharset UTF-8
#AddDefaultCharset GB2312
#AddDefaultCharset EUC-KR
#
# Commonly used filename extensions to character sets. You probably
# want to avoid clashes with the language extensions, unless you
# are good at carefully testing your setup after each change.
# See http://www.iana.org/assignments/character-sets for the
# official list of charset names and their respective RFCs
#
AddCharset ISO-8859-1 .iso8859-1 .latin1
AddCharset ISO-8859-2 .iso8859-2 .latin2 .cen
AddCharset ISO-8859-3 .iso8859-3 .latin3
AddCharset ISO-8859-4 .iso8859-4 .latin4
AddCharset ISO-8859-5 .iso8859-5 .latin5 .cyr .iso-ru
AddCharset ISO-8859-6 .iso8859-6 .latin6 .arb
AddCharset ISO-8859-7 .iso8859-7 .latin7 .grk
AddCharset ISO-8859-8 .iso8859-8 .latin8 .heb
AddCharset ISO-8859-9 .iso8859-9 .latin9 .trk
AddCharset ISO-2022-JP .iso2022-jp .jis
AddCharset ISO-2022-KR .iso2022-kr .kis
AddCharset ISO-2022-CN .iso2022-cn .cis
AddCharset Big5 .Big5 .big5
# For russian, more than one charset is used (depends on client, mostly):
AddCharset WINDOWS-1251 .cp-1251 .win-1251
AddCharset CP866 .cp866
AddCharset KOI8-r .koi8-r .koi8-ru
AddCharset KOI8-ru .koi8-uk .ua
AddCharset ISO-10646-UCS-2 .ucs2
AddCharset ISO-10646-UCS-4 .ucs4
AddCharset UTF-8 .utf8
# The set below does not map to a specific (iso) standard
# but works on a fairly wide range of browsers. Note that
# capitalization actually matters (it should not, but it
# does for some browsers).
#
# See http://www.iana.org/assignments/character-sets
# for a list of sorts. But browsers support few.
#
AddCharset GB2312 .gb2312 .gb
AddCharset utf-7 .utf7
AddCharset utf-8 .utf8
AddCharset big5 .big5 .b5
AddCharset EUC-TW .euc-tw
AddCharset EUC-JP .euc-jp
AddCharset EUC-KR .euc-kr
AddCharset shift_jis .sjis
六、WEB应用指定编码
===================
[***********************JAVA************************]
java:
String str = new String("...".getBytes ("iso-885-1"),"GBK");
...
System.out.println(java.net.URLEncoder .encode("This string has spaces","UTF-8"));
System.out.println(java.net.URLDecoder.decode(input, "UTF-8"));
//http://www.java3z.com/cwbwebhome/article/article2/2414.html?id=1101
...
jsp:
< %@page contentType="text/html; charset=UTF-8"%>
或 (二者可以并存)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1 -
transitional.dtd">
<html xmlns=" http://www.w3.org/1999/xhtml ">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
...
servlet:
response.setCharacterEncoding ("utf-8");
response.setContentType ("text/html;charset=utf-8");
[***********************JAVASCRIPT************************]
方法一:
document.charset = "gb2312"
方法二:
<script id="script1"></script>
document.getElementById('script1').charset = "gb2312"
语法
object.charset [ = sCharSet ]
方法三:
JavaScript日历控件编码设置
<script src="../Script/Calendar.js" type="text/javascript" charset ="gb2312"></script>
七、网络传输入转码
===================
http/https 上传输的是iso-8859-1
转换:
java:
String mytext = java.net.URLEncoder.encode("中国", "utf-8");
String mytext2 = java.net.URLDecoder.decode(mytext, "utf-8");
得到的结果是:
mytext:%E4%B8%AD%E5%9B%BD
mytex2:中国
八、其他
===================
待补充...
===============================================