最近使用HttpClient读取页面出现中文乱码问题,解决问题后写出本文章,希望能对大家有所帮助。

问题描述:HttpClient所读取的页面为UTF-8格式,使用如下方法读取后出现乱码问题,将读取内容转码无效。

 

public static String getHttpResponse(String url)
{
String result = null;
try{

HttpClient httpClient;
GetMethod getMethod;
httpClient = new HttpClient();
getMethod = new GetMethod(url);
getMethod.getParams().setParameter("http.method.retry-handler", new DefaultHttpMethodRetryHandler());
int statusCode = httpClient.executeMethod(getMethod);
if(statusCode == 200)
{
StringBuffer temp = new StringBuffer();
InputStream in = getMethod.getResponseBodyAsStream();
BufferedReader buffer = new BufferedReader(new InputStreamReader(in));
for(String tempstr = ""; (tempstr = buffer.readLine()) != null;)
temp = temp.append(tempstr);

buffer.close();
in.close();
result = temp.toString().trim();
} else
{
System.err.println((new StringBuilder("Can't get page:")).append(url).append("#").append(getMethod.getStatusLine()).toString());
}
}catch(Exception e){
e.printStackTrace();
}
return result;
}

 

通常解决HttpClient乱码的一种方式是在读取时设置读取的编码,如:

httpClient.getParams().setParameter(HttpMethodParams.HTTP_CONTENT_CHARSET, "UTF-8");

但如果当前环境的默认编码不是"UTF-8",这样也会有问题,因为在 BufferedReader buffer = new BufferedReader(new InputStreamReader(in));一句中,InputStreamReader将读入内容解码为平台默认编码,这样往往我们读入的内容第一时间就被转为乱码,之后可能我们无论再怎么转都不是所需的内容。对于这种情况,解决办法是设置InputStreamReader的指定编码,即:

BufferedReader buffer = new BufferedReader(new InputStreamReader(in,"UTF-8"));

引用:http://blog.csdn.net/roseey/article/details/5740279