python 爬虫获取有效的IP代理

最新推荐文章于 2024-08-28 16:14:15 发布

明啊明啊明

最新推荐文章于 2024-08-28 16:14:15 发布

阅读量237

点赞数

分类专栏： python 文章标签： python 爬虫 ip代理免费获取

本文链接：https://blog.csdn.net/H__ello_world/article/details/102986979

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

python 爬虫获取有效的IP代理

通常在爬虫时，网站对于经常访问的IP会有所限制，因此我们需要一个IP代理来帮助我们来爬取网页，那么这些IP代理去哪里找呢？
西刺这个网站提供了很多免费的IP代理，但是这里面并不是都是有效的IP代理，因此还需要筛选,
(感觉西刺的不好用啊)

# @Time :2019/11/9 13:17
# @Auther :Ming
# @Software: PyCharm
import requests
from lxml import etree
def get_proxy():
    proxy = []
    for i in range(1,5):
        url = 'https://www.xicidaili.com/nn/'+ str(i)
        headers ={
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
        }
        reponse = requests.get(url= url,headers = headers)
        html = reponse.text
        html = etree.HTML(html)
        ip = html.xpath('//tr/td[2]/text()')
        post = html.xpath('//tr/td[3]/text()')
        for a in range(len(ip)):
            proxy.append(ip[a]+':'+post[a])
        return proxy

def check(proxy):
    url = 'http://www.baidu.com'
    for proxy in proxy:
        proxies = {
            'http':'http://'+proxy,
            'https': 'http://'+proxy,
        }
        try:
            response = requests.get(url=url,proxies=proxies)
            if response.status_code == 200:
                with open('2.txt','a',encoding='utf-8') as f:
                    f.write(proxy+'\n')
        except requests.exceptions.ConnectionError as e:
            print('error',e.args)
# from selenium import webdriver
a = get_proxy()
check(a)

爬了一会就两个

明啊明啊明

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫获取有效的IP代理

python 爬虫获取有效的IP代理通常在爬虫时，网站对于经常访问的IP会有所限制，因此我们需要一个IP代理来帮助我们来爬取网页，那么这些IP代理去哪里找呢？西刺这个网站提供了很多免费的IP代理，但是这里面并不是都是有效的IP代理，因此还需要筛选,(感觉西刺的不好用啊)# @Time :2019/11/9 13:17# @Auther :Ming# @Software: PyCharm...
复制链接

扫一扫