如何获取大量廉价可靠代理IP地址？

最新推荐文章于 2024-06-11 14:01:56 发布

u010013838

最新推荐文章于 2024-06-11 14:01:56 发布

阅读量1.1k

点赞数

本文链接：https://blog.csdn.net/u010013838/article/details/102523914

版权

难在怎么拥有起码5000个以上的IP地址，对于我这种平民玩家，我更喜欢免费代理，毕竟不花钱，想要免费代理的话，这个我们可以弄个Cookie池，

免费代理网站：

代理名	网站
快代理	https://www.kuaidaili.com/ops/
66免费代理	http://www.66ip.cn/
IP海	http://www.iphai.com/
国内高匿IP	https://www.xicidaili.com/nn/
优代理	http://www.data5u.com/free/gngn/index.shtml

免费代理的话，有个坏处，就是，你获取到的IP代理，有一些是挺久以前，也就是大部分都是过期的了，想要弄个5000个IP地址不是问题，可是能用的，寥寥无几，我试了一下，300个大概10个可用吧，所以，我为了加快速度，忍心花钱买了三次，一次一天，IP无限量，讲真的，这些IP地址，刚刚开始，大部分都能爬取成功，那速度，真的快，只是后面，这个题目就是你爬一次，IP封一次，所以后面速度会相对满一点，等你测试通过之后，没有bug再买一天，一顿饭钱，可以换来几个小时的缩减。

代理名字	价格
66代理http://www.66daili.cn/UserManage/	8块钱一天无限量
西瓜代理http://www.xiguadaili.com/	9块钱一天无限量

66代理： web接口提取：
在这里插入图片描述你可以用selenium来弄个自动爬取（每次IP数量1000+）

西瓜代理： 提供web接口以及txt文档下载执行(每次IP数量100+）

这里我就用selenium爬取数据，然后保存到redis数据库中的代码：

import redis
from selenium import webdriver
import time
import re
class Redis_Client():
    """数据库客户端"""
    def __init__(self):
        self.db = redis.StrictRedis(host='localhost', port=6379, db=4, decode_responses=True)
    def add(self, proxy, score=100):
        """
        添加代理，设置分数为初始分数
        :param proxy: 代理
        :param score: 分数
        :return: 添加结果
        """
        if not self.db.zscore("proxies", proxy):
            return self.db.zadd("proxies", {proxy: score})
def spider_ip(url, redis_client):
    """
    每隔1分钟爬取一次，， 然后添加到redis数据库中
    :param url: 西瓜代理url
    :param redis_client: redis数据库客户端
    :return: None
    """
    count = 0
    while True:
        try:
            chrome_opetions = webdriver.ChromeOptions()
            chrome_opetions.add_argument("--headless")
            browser = webdriver.Chrome(options=chrome_opetions)
            browser.maximize_window()
            browser.get(url)
            # 获取ip地址
            order_id = browser.find_element_by_id("tid")
            order_id.clear()
            order_id.send_keys("555050574900674")
            check = browser.find_element_by_xpath("//input[@name='category'][3]")
            check.click()
            time.sleep(3)
            submit_click = browser.find_element_by_id("submit_button")
            submit_click.click()
            browser.switch_to_window(browser.window_handles[1])
            # 提取数据
            response = browser.page_source
            text = re.findall(r"pre-wrap;\">(.*)<\/pre.*>", response, re.S)[0]
            if text == "ERROR|没有找到符合条件的IP":
                time.sleep(20)
                browser.quit()
                continue
            with open("id.txt", "w") as fd:
                fd.write(text)
            for i in open("id.txt", "r"):
                print(i.strip())
                proxy = re.sub(r"\s", "", i)
                redis_client.add(proxy=proxy)
            count += 1
            time.sleep(4)
            browser.quit()
            time.sleep(40)
        except Exception as e:
            print(e)
            time.sleep(10)
            continue
        if count == 1500:
            break
def main():
    url = "http://www.xiguadaili.com/web"
    rc = Redis_Client()
    # 开始爬取
    spider_ip(url, rc)
if __name__ == '__main__':
    main()