java.io.IOException: Server returned HTTP response code: 403 for URL: http://的解决办法

最新推荐文章于 2021-08-05 15:15:24 发布

你是我的绝笔

最新推荐文章于 2021-08-05 15:15:24 发布

阅读量1.9w

点赞数 1

分类专栏：网络爬虫文章标签：爬虫 mozilla exception

本文链接：https://blog.csdn.net/w1242245/article/details/21374977

版权

网络爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Exception in thread "main" org.htmlparser.util.ParserException: Exception getting input stream from http://blog.csdn.net/w1242245(Server returned HTTP response code: 403 for URL: http://blog.csdn.net/w1242245).;

java.io.IOException: Server returned HTTP response code: 403 for URL: http://blog.csdn.net/w1242245

今天在爬虫扒取网页的时候，其中有一个网页我发现不能像往常一样将内容扒下，增加以下内容解决：

原：

    private Parser parser;
    public TestUrl(String htmlAddress) throws ParserException, IOException {
       parser = new Parser(htmlAddress);
       parser.setEncoding("gb2312");
   }

修改以后：

    private Parser parser;
    public TestUrl(String htmlAddress) throws ParserException, IOException {
       URL url = new URL(htmlAddress);
        HttpURLConnection connection = (HttpURLConnection) url.
            openConnection();
        connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
       parser = new Parser(connection);
       parser.setEncoding("gb2312");
   }

成功解决！