做java网络爬虫,爬取网页html代码,结果里面的字符倒是可以正常显示,汉字却显示成一堆乱码
错误代码:
if(responseCode == HttpURLConnection.HTTP_OK){
//得到响应流
InputStream inputStream = connection.getInputStream();
//获取响应
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String returnStr = "";
String line;
while ((line = reader.readLine()) != null){
returnStr+=line + "\r\n";
}
reader.close();
inputStream.close();
connection.disconnect();
System.out.println(returnStr);
解决方法:就在构造InputStreamReader对象时添加了一个参数"UTF-8"
正确代码:
if(responseCode == HttpURLConnection.HTTP_OK){
//得到响应流
InputStream inputStream = connection.getInputStream();
//获取响应
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
String returnStr = "";
String line;
while ((line = reader.readLine()) != null){
returnStr+=line + "\r\n";
}
reader.close();
inputStream.close();
connection.disconnect();
System.out.println(returnStr);
然后就🆗了,汉字可以正常显示了