如何让UTF-8在Java webapps中运行?

本文翻译自:How to get UTF-8 working in Java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. 我需要UTF-8的工作我的Java Web应用程序(servlet的JSP +,不使用框架),以支持äöå等定期芬兰文字和西里尔字母像ЦжФ特殊情况。

My setup is the following: 我的设置如下:

  • Development environment: Windows XP 开发环境:Windows XP
  • Production environment: Debian 制作环境:Debian

Database used: MySQL 5.x 使用的数据库:MySQL 5.x.

Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site. 用户主要使用Firefox2,但Opera 9.x,FF3,IE7和谷歌Chrome也用于访问该网站。

How to achieve this? 怎么做到这一点?


#1楼

参考:https://stackoom.com/question/a96/如何让UTF-在Java-webapps中运行


#2楼

For my case of displaying Unicode character from message bundles, I don't need to apply "JSP page encoding" section to display Unicode on my jsp page. 对于我从消息包中显示Unicode字符的情况,我不需要应用“JSP页面编码”部分来在我的jsp页面上显示Unicode。 All I need is "CharsetFilter" section. 我只需要“CharsetFilter”部分。


#3楼

Answering myself as the FAQ of this site encourages it. 回答我自己作为本网站的常见问题解答鼓励它。 This works for me: 这对我有用:

Mostly characters äåö are not a problematic as the default character set used by browsers and tomcat/java for webapps is latin1 ie. 大多数字符äåö不是问题,因为浏览器使用的默认字符集和用于webapps的tomcat / java是latin1即。 ISO-8859-1 which "understands" those characters. ISO-8859-1“理解”这些角色。

To get UTF-8 working under Java+Tomcat+Linux/Windows+Mysql requires the following: 要使UTF-8在Java + Tomcat + Linux / Windows + Mysql下工作,需要以下内容:

Configuring Tomcat's server.xml 配置Tomcat的server.xml

It's necessary to configure that the connector uses UTF-8 to encode url (GET request) parameters: 有必要配置连接器使用UTF-8来编码url(GET请求)参数:

<Connector port="8080" maxHttpHeaderSize="8192"
 maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
 enableLookups="false" redirectPort="8443" acceptCount="100"
 connectionTimeout="20000" disableUploadTimeout="true" 
 compression="on" 
 compressionMinSize="128" 
 noCompressionUserAgents="gozilla, traviata" 
 compressableMimeType="text/html,text/xml,text/plain,text/css,text/ javascript,application/x-javascript,application/javascript"
 URIEncoding="UTF-8"
/>

The key part being URIEncoding="UTF-8" in the above example. 在上面的例子中,关键部分是URIEncoding =“UTF-8” This quarantees that Tomcat handles all incoming GET parameters as UTF-8 encoded. 这保证了Tomcat将所有传入的GET参数处理为UTF-8编码。 As a result, when the user writes the following to the address bar of the browser: 因此,当用户将以下内容写入浏览器的地址栏时:

 https://localhost:8443/ID/Users?action=search&name=*ж*

the character ж is handled as UTF-8 and is encoded to (usually by the browser before even getting to the server) as %D0%B6 . 字符被处理为UTF-8并被编码为(通常在浏览器之前通过浏览器) %D0%B6

POST request are not affected by this. POST请求不受此影响。

CharsetFilter CharsetFilter

Then it's time to force the java webapp to handle all requests and responses as UTF-8 encoded. 然后是时候强制java webapp以UTF-8编码处理所有请求和响应。 This requires that we define a character set filter like the following: 这要求我们定义一个字符集过滤器,如下所示:

package fi.foo.filters;

import javax.servlet.*;
import java.io.IOException;

public class CharsetFilter implements Filter {

    private String encoding;

    public void init(FilterConfig config) throws ServletException {
        encoding = config.getInitParameter("requestEncoding");
        if (encoding == null) encoding = "UTF-8";
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
            throws IOException, ServletException {
        // Respect the client-specified character encoding
        // (see HTTP specification section 3.4.1)
        if (null == request.getCharacterEncoding()) {
            request.setCharacterEncoding(encoding);
        }

        // Set the default response content type and encoding
        response.setContentType("text/html; charset=UTF-8");
        response.setCharacterEncoding("UTF-8");

        next.doFilter(request, response);
    }

    public void destroy() {
    }
}

This filter makes sure that if the browser hasn't set the encoding used in the request, that it's set to UTF-8. 此过滤器确保如果浏览器未设置请求中使用的编码,则将其设置为UTF-8。

The other thing done by this filter is to set the default response encoding ie. 此过滤器完成的另一件事是设置默认响应编码即。 the encoding in which the returned html/whatever is. 返回的html /是什么的编码。 The alternative is to set the response encoding etc. in each controller of the application. 另一种方法是在应用程序的每个控制器中设置响应编码等。

This filter has to be added to the web.xml or the deployment descriptor of the webapp: 必须将此过滤器添加到web.xml或webapp的部署描述符中:

 <!--CharsetFilter start--> 

  <filter>
    <filter-name>CharsetFilter</filter-name>
    <filter-class>fi.foo.filters.CharsetFilter</filter-class>
      <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
  </filter>

  <filter-mapping>
    <filter-name>CharsetFilter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

The instructions for making this filter are found at the tomcat wiki ( http://wiki.apache.org/tomcat/Tomcat/UTF-8 ) 有关制作此过滤器的说明,请访问tomcat wiki( http://wiki.apache.org/tomcat/Tomcat/UTF-8

JSP page encoding JSP页面编码

In your web.xml , add the following: 在您的web.xml中 ,添加以下内容:

<jsp-config>
    <jsp-property-group>
        <url-pattern>*.jsp</url-pattern>
        <page-encoding>UTF-8</page-encoding>
    </jsp-property-group>
</jsp-config>

Alternatively, all JSP-pages of the webapp would need to have the following at the top of them: 或者,webapp的所有JSP页面都需要在它们的顶部有以下内容:

 <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

If some kind of a layout with different JSP-fragments is used, then this is needed in all of them. 如果使用某种具有不同JSP片段的布局,则所有这些都需要这样。

HTML-meta tags HTML元标记

JSP page encoding tells the JVM to handle the characters in the JSP page in the correct encoding. JSP页面编码告诉JVM以正确的编码处理JSP页面中的字符。 Then it's time to tell the browser in which encoding the html page is: 然后是时候告诉浏览器html页面的编码是:

This is done with the following at the top of each xhtml page produced by the webapp: 这是通过webapp生成的每个xhtml页面顶部的以下内容完成的:

   <?xml version="1.0" encoding="UTF-8"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
   <head>
   <meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
   ...

JDBC-connection JDBC连接

When using a db, it has to be defined that the connection uses UTF-8 encoding. 使用db时,必须定义连接使用UTF-8编码。 This is done in context.xml or wherever the JDBC connection is defiend as follows: 这可以在context.xml中完成,也可以在JDBC连接的任何地方完成,如下所示:

      <Resource name="jdbc/AppDB" 
        auth="Container"
        type="javax.sql.DataSource"
        maxActive="20" maxIdle="10" maxWait="10000"
        username="foo"
        password="bar"
        driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/      ID_development?useEncoding=true&amp;characterEncoding=UTF-8"
    />

MySQL database and tables MySQL数据库和表

The used database must use UTF-8 encoding. 使用过的数据库必须使用UTF-8编码。 This is achieved by creating the database with the following: 这是通过使用以下内容创建数据库来实现的:

   CREATE DATABASE `ID_development` 
   /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci */;

Then, all of the tables need to be in UTF-8 also: 然后,所有表格也必须是UTF-8:

   CREATE TABLE  `Users` (
    `id` int(10) unsigned NOT NULL auto_increment,
    `name` varchar(30) collate utf8_swedish_ci default NULL
    PRIMARY KEY  (`id`)
   ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci ROW_FORMAT=DYNAMIC;

The key part being CHARSET=utf8 . 关键部分是CHARSET = utf8

MySQL server configuration MySQL服务器配置

MySQL serveri has to be configured also. 还必须配置MySQL serveri。 Typically this is done in Windows by modifying my.ini -file and in Linux by configuring my.cnf -file. 通常,这可以通过修改my.ini -file在Windows中完成,在Linux中通过配置my.cnf -file来完成。 In those files it should be defined that all clients connected to the server use utf8 as the default character set and that the default charset used by the server is also utf8. 在这些文件中,应该定义连接到服务器的所有客户端都使用utf8作为默认字符集,并且服务器使用的默认字符集也是utf8。

   [client]
   port=3306
   default-character-set=utf8

   [mysql]
   default-character-set=utf8

Mysql procedures and functions Mysql程序和功能

These also need to have the character set defined. 这些还需要定义字符集。 For example: 例如:

   DELIMITER $$

   DROP FUNCTION IF EXISTS `pathToNode` $$
   CREATE FUNCTION `pathToNode` (ryhma_id INT) RETURNS TEXT CHARACTER SET utf8
   READS SQL DATA
   BEGIN

    DECLARE path VARCHAR(255) CHARACTER SET utf8;

   SET path = NULL;

   ...

   RETURN path;

   END $$

   DELIMITER ;

GET requests: latin1 and UTF-8 GET请求:latin1和UTF-8

If and when it's defined in tomcat's server.xml that GET request parameters are encoded in UTF-8, the following GET requests are handled properly: 如果在tomcat的server.xml中定义了GET请求参数以UTF-8编码,则可以正确处理以下GET请求:

   https://localhost:8443/ID/Users?action=search&name=Petteri
   https://localhost:8443/ID/Users?action=search&name=ж

Because ASCII-characters are encoded in the same way both with latin1 and UTF-8, the string "Petteri" is handled correctly. 由于ASCII字符的编码方式与latin1和UTF-8相同,因此正确处理字符串“Petteri”。

The Cyrillic character ж is not understood at all in latin1. latin1中完全没有理解西里尔字符。 Because Tomcat is instructed to handle request parameters as UTF-8 it encodes that character correctly as %D0%B6 . 因为Tomcat被指示以UTF-8的形式处理请求参数,所以它正确地将该字符编码为%D0%B6

If and when browsers are instructed to read the pages in UTF-8 encoding (with request headers and html meta-tag), at least Firefox 2/3 and other browsers from this period all encode the character themselves as %D0%B6 . 如果指示浏览器以UTF-8编码(带有请求标头和html元标记)读取页面,则至少Firefox 2/3和此期间的其他浏览器都将字符本身编码为%D0%B6

The end result is that all users with name "Petteri" are found and also all users with the name "ж" are found. 最终结果是找到名称为“Petteri”的所有用户,并且还找到名为“ж”的所有用户。

But what about äåö? 但是äåö呢?

HTTP-specification defines that by default URLs are encoded as latin1. HTTP规范定义默认情况下URL被编码为latin1。 This results in firefox2, firefox3 etc. encoding the following 这导致firefox2,firefox3等编码如下

    https://localhost:8443/ID/Users?action=search&name=*Päivi*

in to the encoded version 进入编码版本

    https://localhost:8443/ID/Users?action=search&name=*P%E4ivi*

In latin1 the character ä is encoded as %E4 . 在latin1中,角色ä编码为%E4 Even though the page/request/everything is defined to use UTF-8 . 即使页面/请求/所有内容都定义为使用UTF-8 The UTF-8 encoded version of ä is %C3%A4 UT的UTF-8编码版本是%C3%A4

The result of this is that it's quite impossible for the webapp to correly handle the request parameters from GET requests as some characters are encoded in latin1 and others in UTF-8. 结果是,webapp很难从GET请求中正确处理请求参数,因为某些字符在latin1中编码,而其他字符在UTF-8中编码。 Notice: POST requests do work as browsers encode all request parameters from forms completely in UTF-8 if the page is defined as being UTF-8 注意:POST请求确实有效,因为如果页面被定义为UTF-8,浏览器将完全以UTF-8格式编码表单中的所有请求参数

Stuff to read 东西要读

A very big thank you for the writers of the following for giving the answers for my problem: 非常感谢下列作者为我的问题提供答案:

  • http://tagunov.tripod.com/i18n/i18n.html http://tagunov.tripod.com/i18n/i18n.html
  • http://wiki.apache.org/tomcat/Tomcat/UTF-8 http://wiki.apache.org/tomcat/Tomcat/UTF-8
  • http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
  • http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
  • http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
  • http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
  • http://jeppesn.dk/utf-8.html http://jeppesn.dk/utf-8.html
  • http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
  • http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
  • http://www.utf8-chartable.de/ http://www.utf8-chartable.de/

Important Note 重要的提示

supports the Basic Multilingual Plane using 3-byte UTF-8 characters. 支持使用3字节UTF-8字符的Basic Multilingual Plane If you need to go outside of that (certain alphabets require more than 3-bytes of UTF-8), then you either need to use a flavor of VARBINARY column type or use the utf8mb4 character set (which requires MySQL 5.5.3 or later). 如果你需要超出它(某些字母表需要超过3个字节的UTF-8),那么你需要使用VARBINARY列类型或使用utf8mb4字符集 (这需要MySQL 5.5.3或更高版本) )。 Just be aware that using the utf8 character set in MySQL won't work 100% of the time. 请注意,在MySQL中使用utf8字符集将无法在100%的时间内正常工作。

Tomcat with Apache Tomcat与Apache

One more thing If you are using Apache + Tomcat + mod_JK connector then you also need to do following changes: 还有一件事如果您使用的是Apache + Tomcat + mod_JK连接器,那么您还需要进行以下更改:

  1. Add URIEncoding="UTF-8" into tomcat server.xml file for 8009 connector, it is used by mod_JK connector. 将URIEncoding =“UTF-8”添加到8009连接器的tomcat server.xml文件中,它由mod_JK连接器使用。 <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8"/>
  2. Goto your apache folder ie /etc/httpd/conf and add AddDefaultCharset utf-8 in httpd.conf file . 转到你的apache文件夹,即/etc/httpd/conf并在httpd.conf file添加AddDefaultCharset utf-8 Note: First check that it is exist or not. 注意:首先检查它是否存在。 If exist you may update it with this line. 如果存在,您可以使用此行更新它。 You can add this line at bottom also. 您也可以在底部添加此行。

#4楼

I think you summed it up quite well in your own answer. 我想你在自己的答案中总结得很好。

In the process of UTF-8-ing(?) from end to end you might also want to make sure java itself is using UTF-8. 在端到端的UTF-8-ing(?)过程中,您可能还需要确保java本身使用的是UTF-8。 Use -Dfile.encoding=utf-8 as parameter to the JVM (can be configured in catalina.bat). 使用-Dfile.encoding = utf-8作为JVM的参数(可以在catalina.bat中配置)。


#5楼

This is for Greek Encoding in MySql tables when we want to access them using Java: 当我们想要使用Java访问它们时,这是针对MySql表中的希腊语编码:

Use the following connection setup in your JBoss connection pool (mysql-ds.xml) 在JBoss连接池中使用以下连接设置(mysql-ds.xml)

<connection-url>jdbc:mysql://192.168.10.123:3308/mydatabase</connection-url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<user-name>nts</user-name>
<password>xaxaxa!</password>
<connection-property name="useUnicode">true</connection-property>
<connection-property name="characterEncoding">greek</connection-property>

If you don't want to put this in a JNDI connection pool, you can configure it as a JDBC-url like the next line illustrates: 如果您不想将它放在JNDI连接池中,可以将其配置为JDBC-url,如下一行所示:

jdbc:mysql://192.168.10.123:3308/mydatabase?characterEncoding=greek

For me and Nick, so we never forget it and waste time anymore..... 对我和尼克来说,我们永远不会忘记它,浪费时间......


#6楼

In case you have specified in connection pool (mysql-ds.xml), in your Java code you can open the connection as follows: 如果您已在连接池(mysql-ds.xml)中指定,则在Java代码中可以按如下方式打开连接:

DriverManager.registerDriver(new com.mysql.jdbc.Driver());
Connection conn = DriverManager.getConnection(
    "jdbc:mysql://192.168.1.12:3308/mydb?characterEncoding=greek",
    "Myuser", "mypass");
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值