html form utf8,java - HTML : Form does not send UTF-8 format inputs - Stack Overflow

Warm up

Let me start by saying the universal fact which we all know that computer doesn't understand anything but bits - 0's and 1's.

Now, when you are submitting a HTML form over HTTP and values travel over the wire to reach destination server then essentially a whole lot of bits - 0's and 1's are being passed over.

Before sending the data to the server, HTTP client (browser or curl etc.) will encode it using some encoding scheme and expects server to decode it using same scheme so that server knows exactly what client has sent.

Before sending response back to the client, server will encode it using some encoding scheme and expects client to decode it using same scheme so that client knows exactly what server has sent.

An analogy for this can be - I am sending a letter to you and telling you whether it is written in English or French or Dutch, so that you will get exact message as I intended to send you. And while replying to me you will also mention in which language I should read.

Important take away is that the fact that when data is leaving the client it will be encoded and same will be decoded at server side, and vice-versa. If you do not specify anything then content will be encoded as per application/x-www-form-urlencoded before leaving from client side to server side.

Core concept

Reading warm up is important. There are couple of things you need to make sure to get the expected results.

Having correct encoding set before sending data from client to server.

Having correct decoding and encoding set at server side to read request and write response back to client (this was the reason why you were not getting expected results)

Ensure that everywhere same encoding scheme is used, it should not happen that at client you are encoding using ISO-8859-1 and at server you are decoding using UTF-8, else there will be goof-up (from my analogy, I am writing you in English and you are reading in French)

Having correct encoding set for your logs viewer, if trying to verify using log using Windows command-line or Eclipse log viewer etc. (this was a contributing reason for your issue but it was not primary reason because in the first place your data read from request object was not correctly decoded. windows cmd or Eclipse log viewer encoding also matters, read here)

Having correct encoding set before sending data from client to server

To ensure this, there are several ways talked about but I will say use HTTP Accept-Charset request-header field. As per your provided code snippet you are already using and using it correctly so you are good from that front.

There are people who will say that do not use this or it is not implemented but I would very humbly disagree with them. Accept-Charset is part of HTTP 1.1 specification (I have provided link) and browser implementing HTTP 1.1 will implement the same. They may also argue that use Accept request-header field's "charset" attribute but

Really it is not present, check the Accept request-header field link I provided.

Check this

I am providing you all data and facts, not just words, but still if you are not satisfied then do following tests using different browsers.

Set accept-charset="ISO-8859-1" in your HTML form and POST/GET form having Chinese or advanced French characters to server.

At server decode the data using UTF-8 scheme.

Now repeat same test by swapping client and server encoding.

You will see that none of times you were able to see the expected characters at server. But if you will use same encoding scheme then you will see expected character. So, browsers do implements accept-charset and its effect kicks-in.

Having correct decoding and encoding set at server side to read request and write response back to client

There are hell lot of ways talked about that you can do to achieve this (sometime some configuration may be required based on specific scenario but below solves 95% cases and holds good for your case as well). For example:

Use character encoding filter for setting encoding on request and response.

Use setCharacterEncoding on request and response

Configure web or application server for correct character encoding using -Dfile.encoding=utf8 etc. Read more here

Etc.

My favorite is first one and will solve your problem as well - "Character Encoding Filter", because of below reasons:

All you encoding handling logic is at one place.

You have all the power through configuration, change at one place and everyone if happy.

You need not to worry that some other code may be reading my request stream or flushing out the response stream before I could set the character encoding.

1. Character encoding filter

You can do following to implement your own character encoding filter. If you are using some framework like Springs etc. then you need not to write you own class but just do the configuration in web.xml

Core logic in below is very similar to what Spring does, apart from a lot of dependency, bean aware thing they do.

web.xml (configuration)

EncodingFilter

com.sks.hagrawal.EncodingFilter

encoding

UTF-8

forceEncoding

true

EncodingFilter

/*

EncodingFilter (character encoding implementation class)

public class EncodingFilter implements Filter {

private String encoding = "UTF-8";

private boolean forceEncoding = false;

public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException {

request.setCharacterEncoding(encoding);

if(forceEncoding){ //If force encoding is set then it means that set response stream encoding as well ...

response.setCharacterEncoding(encoding);

}

filterChain.doFilter(request, response);

}

public void init(FilterConfig filterConfig) throws ServletException {

String encodingParam = filterConfig.getInitParameter("encoding");

String forceEncoding = filterConfig.getInitParameter("forceEncoding");

if (encodingParam != null) {

encoding = encodingParam;

}

if (forceEncoding != null) {

this.forceEncoding = Boolean.valueOf(forceEncoding);

}

}

@Override

public void destroy() {

// TODO Auto-generated method stub

}

}

2. ServletRequest.setCharacterEncoding()

This is essentially same code done in character encoding filter but instead of doing in filter, you are doing it in your servlet or controller class.

Idea is again to use request.setCharacterEncoding("UTF-8"); to set the encoding of http request stream before you start reading the http request stream.

Try below code, and you will see that if you are not using some sort of filter to set the encoding on request object then first log will be NULL while second log will be "UTF-8".

System.out.println("CharacterEncoding = " + request.getCharacterEncoding());

request.setCharacterEncoding("UTF-8");

System.out.println("CharacterEncoding = " + request.getCharacterEncoding());

Below is important excerpt from setCharacterEncoding Java docs. Another thing to note is you should provide a valid encoding scheme else you will get UnsupportedEncodingException

Overrides the name of the character encoding used in the body of this

request. This method must be called prior to reading request

parameters or reading input using getReader(). Otherwise, it has no

effect.

Wherever needed I have tried best to provide you official links or StackOverflow accepted bounty answers, so that you can build trust.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值