简单的免费代理池的爬取

最新推荐文章于 2023-01-03 11:57:47 发布

MingrenChen

最新推荐文章于 2023-01-03 11:57:47 发布

阅读量1.2k

点赞数

分类专栏： python 文章标签：免费爬虫代理

本文链接：https://blog.csdn.net/m0_37665900/article/details/78008169

版权

前几天做Scrapy爬虫的时候被某网站ban了，于是写了一个爬取代理池的小程序。不知道为什么xici的代理全部报错，于是找了个国外的免费代理网站爬取,

网址是

https://free-proxy-list.net/

直接放代码。

# coding:utf-8

import queue
import threading

import requests
from bs4 import BeautifulSoup


class ProxyGetter:
    def __init__(self, num=300):
        # num 代表爬取代理地址的数目，默认为全部爬取，也就是300.
        self.num = num
        self.url = "https://free-proxy-list.net/"
        # 伪装response的header
        self.header = {"User-Agent": 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'}
        self.q = queue.Queue()
        self.Lock = threading.Lock()

    def get_ips(self):
        # 把爬取的代理地址存储在当前文件夹list.txt文件中。
        with open("list.txt", "w") as l:
            res = requests.get(self.url, headers=self.head

最低0.47元/天解锁文章

MingrenChen

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
简单的免费代理池的爬取

前几天做Scrapy爬虫的时候被某网站ban了，于是写了一个爬取代理池的小程序。不知道为什么xici的代理全部报错，于是找了个国外的免费代理网站爬取,网址是 https://free-proxy-list.net/直接放代码。# coding:utf-8import requestsfrom bs4 import BeautifulSoupimport thread
复制链接

扫一扫