两种方法分别采用HttpClient和URLConnection,同时解决乱码问题。经真机测试,好像是HttpClient方式比较稳定,一般都能下载到,但是URLConnection在EDGE网络下经常下不到数据。
HttpClient方式:
1
2
3
4
5
6
7
8
9
10
11
| public String getHtml(String url) throws IOException, URISyntaxException{
URI u=new URI(url);
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet(u);
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String content = httpclient.execute(httpget, responseHandler);
content = new String(content.getBytes("ISO-8859-1"),"UTF-8");
//目标页面编码为UTF-8,没这个会乱码
return content;
} |
URLConnection方式:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| public String getHTML(String url)
{
try{
URL newUrl=new URL(url);
URLConnection connect=newUrl.openConnection();
connect.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
DataInputStream dis=new DataInputStream(connect.getInputStream());
BufferedReader in = new BufferedReader(new InputStreamReader(dis,"UTF-8"));//目标页面编码为UTF-8
String html="";
String readLine=null;
while((readLine=in.readLine())!=null)
{
html=html+readLine;
}
in.close();
return html;
}catch(MalformedURLException me){
}
catch(IOException ioe){
}
return null;
} |