在使用Jsoup抓取网页时,时常发生SocketTimeoutException的异常:
代码:
doc = Jsoup.connect(url).timeout(5000).userAgent("Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 5.0)").get();
异常:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:656)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:629)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:261)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:250)
at getdt.GetData.getDate(GetData.java:133)
at getdt.GetData.main(GetData.java:77)
出错后,用浏览器还能正常打开。于是乎,厚脸皮一下。
doc = getDate(url);
public static Document getDate(String url) {
Document doc = null;
boolean flag = true;
while (flag) {
try {
doc = Jsoup.connect(url).timeout(5000).userAgent("Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 5.0)").get();
flag = false;
} catch (IOException e) {
// e.printStackTrace();
}
}
return doc;
}
既然好用了,吾谁与归?