python爬取代理IP并保存到本地

最新推荐文章于 2024-04-11 18:47:32 发布

healthy_T

最新推荐文章于 2024-04-11 18:47:32 发布

阅读量1.1k

点赞数

分类专栏：爬虫文章标签： python xpath

本文链接：https://blog.csdn.net/weixin_51211600/article/details/109581491

版权

爬虫专栏收录该内容

8 篇文章 2 订阅

订阅专栏

在爬虫项目中，代理IP的作用是很重要的，在爬取网站数据的时候IP经常被封，所以经常需要更换IP；花钱买吧，对于咱们这种用来测试的来说有点划不来，所以只能找一些免费的用，今天跟大家介绍一个免费代理IP的网站：https://www.kuaidaili.com/

爬取很简单，静态网站，我就不啰嗦了，直接上代码：

import requests
import parsel


def get_ip(page):
    target = f'https://www.kuaidaili.com/free/inha/{page}/'

    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3760.400 QQBrowser/10.5.4083.400',
    }
    html = requests.get(target,headers=headers).text
    res = parsel.Selector(html)

    for each in res.xpath('//*[@id="list"]/table/tbody/tr'):
    	# 匹配 IP 
        ip = each.xpath('./td/text()').get()
        # 匹配 端口号
        port = each.xpath('./td[2]/text()').get()
        
        # 将IP和端口号连接
        ip_port = 'http://'+ ip + ":" + port
        # print(ip_port)

        proxies = {'http': ip_port}
       
        try:
            test = requests.get('https://www.taobao.com/', timeout=0.1, proxies=proxies)
            # 测试 IP 是否可用
            if test.status_code == 200:
                print('响应成功：',ip_port)
                with open('./images/ip池.txt',"a",encoding='utf-8') as f:
                    f.write(ip_port)
                    f.write("\n")
        except:
            print('响应失败！')
            continue

if __name__ == "__main__":
    for page in range(1,11):
        get_ip(page)