爬虫-IP代理池构建

最新推荐文章于 2023-11-28 11:14:59 发布

running+snail

最新推荐文章于 2023-11-28 11:14:59 发布

阅读量228

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/qq_45626019/article/details/106599990

版权

爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

ip代理池构建（可自动调用）
多次爬取，进行ip替换可有效防止访问失败，本代码可直接导入使用

import requests
from lxml import etree
import time
import random
#随机选取代理
def getip(ipstock1):
	key1=random.choice(list(ipstock1))
	proxies2="https://"+key1+":"+ipstock1[key1]
	proxies3={"https":proxies2}
	return proxies3
#代理验证
def USEFUL(ipstock):
	url="http://www.baidu.com/"
	for key in list(ipstock.keys()):
		try:
			proxies="https://"+key+":"+ipstock[key]
			proxies1={"https":proxies}
			res=requests.get(url,proxies1)
			time.sleep(1)
		except Exception as e:
			print(e)
			ipstock.pop(key)
			continue
	return ipstock	
#解析内容，获取ip
def parse_content(url):

	headers={
	"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36",
	}
	r=requests.get(url,headers=headers)
	tree=etree.HTML(r.text)
	IP=tree.xpath('//table/tbody/tr/td[1]/text()')
	PORT=tree.xpath('//table/tbody/tr/td[2]/text()')
	return IP,PORT
	
def main(x,y,X,Y):
	IP2=[]
	PORT2=[]
	DAILI={}
	start_page=random.randint(x,y)
	end_page=random.randint(X,Y)
	#start_page=int(input("请输入爬取的起始页数："))
	#end_page=int(input("请输入爬取的结束页数："))
	#获取你想爬取的网页连接
	url="https://www.kuaidaili.com/free/inha/"#https://www.kuaidaili.com/free/inha/1/
	for i in range(start_page,end_page+1):
		link=url+str(i)+"/"
		print(link)
		IP1,PORT1=parse_content(link)
		IP2.extend(IP1)
		PORT2.extend(PORT1)
		time.sleep(3)
	ipstock=dict(zip(IP2,PORT2))
	ipstock1=USEFUL(ipstock)
	terminal_proxies=getip(ipstock1)
	print(terminal_proxies)

if __name__=="__main__":
	main()

本人新手，写的比较简单，也有许多冗余的地方，请大家多多指教

running+snail

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫-IP代理池构建

ip代理池构建（可自动调用）多次爬取，进行ip替换可有效防止访问失败，本代码可直接导入使用import requestsfrom lxml import etreeimport timeimport random#随机选取代理def getip(ipstock1): key1=random.choice(list(ipstock1)) proxies2="https://"+key1+":"+ipstock1[key1] proxies3={"https":proxies2} retu
复制链接

扫一扫

专栏目录