WEB应用中乱码问题小结(原创)

WEB应用中乱码问题小结(原创)
===============================================
摘要:
WEB应用中出现页面乱码,其实很简单,问题无外乎出在以下几方面:
 文件本身编码
 程序转码
 数据库编码
 数据库内容
 WEB容器指定编码
 WEB应用指定编码
 网络传输入转码
 其他
//


[常见编码]iso-8859-1,gbk, gb2312,big5,unicode,utf-8,utf-16等


一、文件本身编码
===================
 Editplus "另存为"时,有默认/Unicode/Utf-8
 Eclipse 可以设置IDE中某工程编码


二、程序转码
===================
JAVA:
 String str = new String("...".getBytes("iso-885-1"),"GBK");
 ...
 System.out.println(java.net.URLEncoder.encode("This string has spaces","UTF-8"));
 System.out.println(java.net.URLDecoder.decode(input, "UTF-8"));
 //http://www.java3z.com/cwbwebhome/article/article2/2414.html?id=1101

JavaScript:
 <script language="javascript">
 alert(str=encodeURI("你好"))
 alert(decodeURI(str))
 </script>
 //Escape/Unescape
 //Encoding/Decoding
 //http://scriptasylum.com/tutorials/encdec/encode-decode.html

 //http://www.xunlu.net/small-technique/Encoder-Decoder-html.htm


三、数据库编码
===================
 要与应用程序的设置一致
 保证入库的数据不是乱码


四、数据库内容
===================
 保证入库的数据不是乱码
 程序输入的/导库出现的


五、WEB容器指定编码
===================
 [***********************以JAVA tomcat6为例************************]
 conf/server.xml
 ...
     <Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />
 ...

 ROOT/WEB-INF/web.xml
 ...
 <filter>
  <filter-name>encodingFilter</filter-name>
  <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
  <init-param>
      <param-name>encoding</param-name>
      <param-value>utf-8</param-value>
  </init-param>
 </filter>
 <filter-mapping>
  <filter-name>encodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
 </filter-mapping>

 


  <!--2009-2-27 有一次apache httpd.conf中设置了编码,web.xml中有如上设置,server.xml也有设置,但在tomcat中引

用.html还是出问题--->
  <!--而用.jsp没乱码,加了如下代码就好了,应用程序是放在WEB-INF/下面。原因应在此,因为外部直接访问不了此目录下
的信息,以上设置如同未设置--->
 <jsp-config>
      <jsp-property-group>
   <description>
       Special property group for JSP Configuration JSP
       example.
   </description>
   <display-name>JSPConfiguration</display-name>
   <url-pattern>*.html</url-pattern>
   <el-ignored>true</el-ignored>
   <page-encoding>UTF-8</page-encoding>
   <scripting-invalid>false</scripting-invalid>
   <include-prelude></include-prelude>
   <include-coda></include-coda>
      </jsp-property-group>
 </jsp-config>
 <welcome-file-list>
  <welcome-file>index.jsp</welcome-file>
 </welcome-file-list>
 ...

 

 

 <!--另一个应用,只在WEB-INF/web.xml中加了如下代码,tomcat没有管,apache中也没有管,不出现任何问题  start--->
 <!-- 设置Spring对Web开发支持过滤器,对请求参数编码 -->
 <filter>
  <filter-name>encodingFilter</filter-name>
  <filter-class>
   org.springframework.web.filter.CharacterEncodingFilter
  </filter-class>
  <init-param>
   <param-name>encoding</param-name>
   <param-value>UTF-8</param-value>
  </init-param>
  <init-param>
   <param-name>forceEncoding</param-name>
   <param-value>true</param-value>
  </init-param>
 </filter>

...
 <filter-mapping>
  <filter-name>encodingFilter</filter-name>
  <url-pattern>*.do</url-pattern>
 </filter-mapping>
 <!--另一个应用,只在WEB-INF/web.xml中加了如下代码,tomcat没有管,apache中也没有管,不出现任何问题  end--->

 
 [***********************以apache22为例************************]
  AddDefaultCharset UTF-8
 #AddDefaultCharset GB2312
 #AddDefaultCharset EUC-KR

 #
 # Commonly used filename extensions to character sets. You probably
 # want to avoid clashes with the language extensions, unless you
 # are good at carefully testing your setup after each change.
 # See
http://www.iana.org/assignments/character-sets for the
 # official list of charset names and their respective RFCs
 #
 AddCharset ISO-8859-1  .iso8859-1  .latin1
 AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
 AddCharset ISO-8859-3  .iso8859-3  .latin3
 AddCharset ISO-8859-4  .iso8859-4  .latin4
 AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
 AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
 AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
 AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
 AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk
 AddCharset ISO-2022-JP .iso2022-jp .jis
 AddCharset ISO-2022-KR .iso2022-kr .kis
 AddCharset ISO-2022-CN .iso2022-cn .cis
 AddCharset Big5        .Big5       .big5
 # For russian, more than one charset is used (depends on client, mostly):
 AddCharset WINDOWS-1251 .cp-1251   .win-1251
 AddCharset CP866       .cp866
 AddCharset KOI8-r      .koi8-r .koi8-ru
 AddCharset KOI8-ru     .koi8-uk .ua
 AddCharset ISO-10646-UCS-2 .ucs2
 AddCharset ISO-10646-UCS-4 .ucs4
 AddCharset UTF-8       .utf8

 # The set below does not map to a specific (iso) standard
 # but works on a fairly wide range of browsers. Note that
 # capitalization actually matters (it should not, but it
 # does for some browsers).
 #
 # See
http://www.iana.org/assignments/character-sets
 # for a list of sorts. But browsers support few.
 #
 AddCharset GB2312      .gb2312 .gb
 AddCharset utf-7       .utf7
 AddCharset utf-8       .utf8
 AddCharset big5        .big5 .b5
 AddCharset EUC-TW      .euc-tw
 AddCharset EUC-JP      .euc-jp
 AddCharset EUC-KR      .euc-kr
 AddCharset shift_jis   .sjis


六、WEB应用指定编码
===================
 [***********************JAVA************************]
  java:
         String str = new String("...".getBytes ("iso-885-1"),"GBK");
         ...
         System.out.println(java.net.URLEncoder .encode("This string has spaces","UTF-8"));
         System.out.println(java.net.URLDecoder.decode(input, "UTF-8"));
         //http://www.java3z.com/cwbwebhome/article/article2/2414.html?id=1101
 ...

 jsp:
         <
%@page contentType="text/html; charset=UTF-8"%>
 或 (二者可以并存)
         <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
http://www.w3.org/TR/xhtml1/DTD/xhtml1 -

transitional.dtd">
         <html xmlns="
http://www.w3.org/1999/xhtml ">
         <head>
          <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
         ...

 servlet:
        response.setCharacterEncoding ("utf-8");
        response.setContentType ("text/html;charset=utf-8");

 
 [***********************JAVASCRIPT************************]
 方法一:
  document.charset    =   "gb2312"
 方法二:
  <script id="script1"></script>  
  document.getElementById('script1').charset   =   "gb2312"

  语法  
  object.charset   [   =   sCharSet   ]
 方法三:
  JavaScript日历控件编码设置
  <script src="../Script/Calendar.js" type="text/javascript" charset ="gb2312"></script>


七、网络传输入转码
===================
http/https 上传输的是iso-8859-1
转换:
java:
 String mytext   =   java.net.URLEncoder.encode("中国",   "utf-8");  
 String mytext2  =   java.net.URLDecoder.decode(mytext,   "utf-8");
 得到的结果是:
 mytext:%E4%B8%AD%E5%9B%BD    
 mytex2:中国


八、其他
===================
 待补充...


=============================================== 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值