为什么需要encoded ?
操作系统的差异会对URL造成一些问题,比如,有些系统允许filename中有空格,有些则不允许。大部分操作系统允许filename中出现#,但#在URL中有特殊的含义,filename的结束,其后跟着fragment identifier。其他特殊字符,或者非alpha,numeric字符在URL或者在其他操作系统有类似的问题。而且Unicode编码在Web发明时还未出现,所以并非所有系统都能处理了Unicode中的一些字符,比如汉字。
为解决这个问题,使用在URL中的字符必须是ASCII的固定子集:
大小写的a-z,A-Z;数字0-9;符号- _ . ! ~ * ' (and ,),符合 : / & ? @ # ; $ + = and %有特殊的用途,如果这些符号出现在path或者query中,则需要encoded。
encoding很简单,对不是ASCII letter,numeric,特定符合的,都将其转成一个字节,用2位16进制表示,%后跟2位16进制数。URL类不会自动encoding或decoding,要程序员手动进行。
URLEncoder
String encoded = URLEncoder.encode("This*string*has*asterisks", "UTF-8");
encoding很简单,URLEncoder.encode(),会encoding所有不是ASCII 字符,数字, 空格 _ - . *以外的所有字符!即使 ~ ' ! ( ) 也encoding。
尽管可以选择character set,但最好用“UTF-8”,因为它的兼容性最好!
public class EncoderTest {
public static void main(String[] args) {
try {
System.out.println(URLEncoder.encode("This string has spaces",
"UTF-8"));
System.out.println(URLEncoder.encode("This*string*has*asterisks",
"UTF-8"));
System.out.println(URLEncoder.encode("This%string%has%percent%signs",
"UTF-8"));
System.out.println(URLEncoder.encode("This+string+has+pluses",
"UTF-8"));
System.out.println(URLEncoder.encode("This/string/has/slashes",
"UTF-8"));
System.out.println(URLEncoder.encode("This\"string\"has\"quote\"marks",
"UTF-8"));
System.out.println(URLEncoder.encode("This:string:has:colons",
"UTF-8"));
System.out.println(URLEncoder.encode("This~string~has~tildes",
"UTF-8"));
System.out.println(URLEncoder.encode("This(string)has(parentheses)",
"UTF-8"));
System.out.println(URLEncoder.encode("This.string.has.periods",
"UTF-8"));
System.out.println(URLEncoder.encode("This=string=has=equals=signs",
"UTF-8"));
System.out.println(URLEncoder.encode("This&string&has&ersands",
"UTF-8"));
System.out.println(URLEncoder.encode("Thiséstringéhasé
non-ASCII characters", "UTF-8"));
} catch (UnsupportedEncodingException ex) {
throw new RuntimeException("Broken VM does not support UTF-8");
}
}
}
output:
This*string*has*asterisks
This%25string%25has%25percent%25signs
This%2Bstring%2Bhas%2Bpluses
This%2Fstring%2Fhas%2Fslashes
This%22string%22has%22quote%22marks
This%3Astring%3Ahas%3Acolons
This%7Estring%7Ehas%7Etildes
This%28string%29has%28parentheses%29
This.string.has.periods
This%3Dstring%3Dhas%3Dequals%3Dsigns
This%26string%26has%26ampersands
This%C3%A9string%C3%A9has%C3%A9non-ASCII+characters</programlisting>
注意encode()会把 / & : = 也encode了,而不去管这些符号在URL中的特殊用处。因此,应该encode URL piece by piece,而不是encoding entire URl in one method call !
例如
String query = URLEncoder.encode(
"https://www.google.com/search?hl=en&as_q=Java&as_epq=I/O", "UTF-8");
System.out.println(query);
output:
https%3A%2F%2Fwww.google.com%2Fsearch%3Fhl%3Den%26as_q%3DJava%26as_epq%3DI%2FO
而应该这么:
String url = "https://www.google.com/search?";
url += URLEncoder.encode("hl", "UTF-8");
url += "=";
url += URLEncoder.encode("en", "UTF-8");
url += "&";
url += URLEncoder.encode("as_q", "UTF-8");
url += "=";
url += URLEncoder.encode("Java", "UTF-8");
url += "&";
url += URLEncoder.encode("as_epq", "UTF-8");
url += "=";
url += URLEncoder.encode("I/O", "UTF-8");
System.out.println(url);
https://www.google.com/search?hl=en&as_q=Java&as_epq=I%2FO
下面这个类可以用于encode query
public class QueryString {
private StringBuilder query = new StringBuilder();
public QueryString() {
}
public synchronized void add(String name, String value) {
query.append('&');
encode(name, value);
}
private synchronized void encode(String name, String value) {
try {
query.append(URLEncoder.encode(name, "UTF-8"));
query.append('=');
query.append(URLEncoder.encode(value, "UTF-8"));
} catch (UnsupportedEncodingException ex) {
throw new RuntimeException("Broken VM does not support UTF-8");
}
}
public synchronized String getQuery() {
return query.toString();
}
@Override
public String toString() {
return getQuery();
}
}
使用:
QueryString qs = new QueryString();
qs.add("hl", "en");
qs.add("as_q", "Java");
qs.add("as_epq", "I/O");
String url = "http://www.google.com/search?" + qs;
System.out.println(url);
URLDecoder
public static String decode(String s, String encoding)
throws UnsupportedEncodingException
String input = "https://www.google.com/" +
"search?hl=en&as_q=Java&as_epq=I%2FO";
String output = URLDecoder.decode(input, "UTF-8");
System.out.println(output);