利用HttpClient模拟Http请求访问页面,发现乱码(中英文、数字都是),但是从页面上返回的信息是正常的,经过多次尝试,结合网上的信息,发现在设置httpGet参数时,设置了接受压缩类型为Gzip,但却没有对其进行解压缩。解压后在生成字符串,或者这是接受类型为空(即不压缩,效率较低)即可。
httpGet.setHeader("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
httpGet.setHeader("Accept-Language", "zh-cn,zh;q=0.5");
// httpGet.setHeader("Accept-Encoding", "gzip"); //此行注释掉即可!!
httpGet.setHeader("Connection", "keep-alive");
HttpResponse response = null;
try {
response = httpClient.execute(httpGet);
int statusCode = response.getStatusLine().getStatusCode();
if( statusCode!= HttpStatus.SC_OK && statusCode != HttpStatus.SC_MOVED_TEMPORARILY)
throw new NullInfoException();
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
参考:http://my.oschina.net/u/590607/blog/163911
如果是编码问题引起的,英文和数字不会乱码,这种情况进行转码即可。参
考http://blog.sina.com.cn/s/blog_59929ec30100a7ty.html