以下代码通过jdk自带的HttpURLConnection获取百度对关键字"angry brids"的搜索结果。
URL baiduSearch = new URL("http://www.baidu.com/s?wd=angry birds");
HttpURLConnection connection = (HttpURLConnection) baiduSearch.openConnection();
connection.setRequestMethod("POST");
connection.setDoInput(true);
... ...
connection.setRequestProperty("Host", "www.baidu.com");
connection.getOutputStream().flush();
InputStream inputStream = connection.getInputStream();
但是用wireshark抓包发现http请求的参数不符合预期。
请求行(Request-Line)是POST /s?wd=angry bird HTTP/1.1
,按照RFC2616的定义
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
预期的结果是
Method | Request-URI | HTTP-Version |
---|---|---|
POST | /s?wd=angry bird | HTTP/1.1 |
而实际由于查询参数"angry birds"中的空格,请求是
Method | Request-URI | HTTP-Version |
---|---|---|
POST | /s?wd=angry | bird HTTP/1.1 |
见下图红框内容
因此需要在构造url的时候对path "s?wd=angry birds"部分做编码,可以使用jdk的java.net.URLEncoder。