java.io.IOException: Server returned HTTP response code: 403 for URL: http://blog.csdn.net/w1242245
今天在爬虫扒取网页的时候,其中有一个网页我发现不能像往常一样将内容扒下,增加以下内容解决:
原:
private Parser parser;
public TestUrl(String htmlAddress) throws ParserException, IOException {
parser = new Parser(htmlAddress);
parser.setEncoding("gb2312");
}
修改以后:
private Parser parser;
public TestUrl(String htmlAddress) throws ParserException, IOException {
URL url = new URL(htmlAddress);
HttpURLConnection connection = (HttpURLConnection) url.
openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
parser = new Parser(connection);
parser.setEncoding("gb2312");
}
成功解决!