网络爬虫ip代理服务器【程序样例】

最新推荐文章于 2023-06-25 10:04:48 发布

HFUT_qianyang

最新推荐文章于 2023-06-25 10:04:48 发布

阅读量4.9k

点赞数 1

分类专栏：基于java网络爬虫基于Java的网络爬虫原理与技术实战文章标签：网络爬虫代理服务器

本文链接：https://blog.csdn.net/qy20115549/article/details/54945974

版权

基于java网络爬虫同时被 2 个专栏收录

43 篇文章 9 订阅

订阅专栏

基于Java的网络爬虫原理与技术实战

37 篇文章 164 订阅

订阅专栏

爬虫有的时候会遇到被禁ip的情况，这个时候你可以找一下代理网站，抓取一下ip，来进行动态的轮询就没问题了，也可以用别人做好的第三方ip代理平台，比如说crawlera，crawlera是一个利用代理IP地址池来做分布式下载的第三方平台。【具体介绍请看这篇博客：http://blog.csdn.net/djd1234567/article/details/51741557】

package daili;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.InetSocketAddress;
import java.net.MalformedURLException;
import java.net.Proxy;
import java.net.URL;
import java.net.URLConnection;
/*
 * author:合肥工业大学 管院学院 钱洋 
 *1563178220@qq.com
 *博客地址:http://blog.csdn.net/qy20115549/
*/
public class GetHtml {
    public static void main(String[] args) throws UnsupportedEncodingException {
        //输入代理ip，端口，及所要爬取的url
        gethtml("183.136.217.74",8080,"http://club.autohome.com.cn/bbs/forum-c-2533-1.html?orderby=dateline&qaType=-1");

    }
    public static String gethtml(String ip,int port,String url) throws UnsupportedEncodingException{
        URL url1 = null;
        try {
            url1 = new URL(url);
        } catch (MalformedURLException e1) {
            e1.printStackTrace();
        }
        InetSocketAddress addr = null;
        //代理服务器的ip及端口
        addr = new InetSocketAddress(ip, port);
        Proxy proxy = new Proxy(Proxy.Type.HTTP, addr); // http proxy
        InputStream in = null;
        try {
            URLConnection conn = url1.openConnection(proxy);
            conn.setConnectTimeout(3000);
            in = conn.getInputStream();
        } catch (Exception e) {
            System.out.println("ip " + " is not aviable");//异常IP
        }

        String s = convertStreamToString(in);
        System.out.println(s);
        return s;

    }
    public static String convertStreamToString(InputStream is) throws UnsupportedEncodingException {
        if (is == null)
            return "";
        BufferedReader reader = new BufferedReader(new InputStreamReader(is,"gb2312"));
        StringBuilder sb = new StringBuilder();
        String line = null;
        try {
            while ((line = reader.readLine()) != null) {
                sb.append(line + "/n");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return sb.toString();

    }
}

如下图，便可以抓取到url对应的html内容。

这里写图片描述

HFUT_qianyang

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
网络爬虫ip代理服务器【程序样例】

爬虫有的时候会遇到被禁ip的情况，这个时候你可以找一下代理网站，抓取一下ip，来进行动态的轮询就没问题了，也可以用别人做好的第三方ip代理平台，比如说crawlera，crawlera是一个利用代理IP地址池来做分布式下载的第三方平台。【具体介绍请看这篇博客：http://blog.csdn.net/djd1234567/article/details/51741557】package daili;
复制链接

扫一扫