python3进行网络爬虫工作不可避免对某一网站进行大量的访问,短时间对网站的大量访问可能会被网站管理者发现继而被管理者限制访问的速度和次数,甚至对本机的IP进行封禁。为此在爬虫工作前进行伪装,本次文章讲述如何设置代理IP(如何获取有效的代理IP,读者可以查找本人之前的文章链接)。
首先导入需要的模块和相应的IP:
import random
import requests
# 有用的ip
ip_list = [
{'HTTPS' : '120.83.103.87:9999'},
{'HTTPS': '120.83.109.33:9999'},
{'HTTPS': '1.199.30.247:9999'},
{'HTTPS': '58.253.155.189:9999'},
{'HTTPS': '120.84.101.75:9999'},
{'HTTPS': '163.204.241.125:9999'},
{'HTTPS': '175.155.137.30:1133'},
{'HTTPS': '58.253.158.156:9999'},
{'HTTPS': '58.253.156.8:9999'},
{'HTTPS': '112.85.164.168:9999'},
{'HTTPS': '120.83.109.113:9999'},
{'HTTPS': '1.198.73.43:9999'},
{'HTTPS': '163.204.242.153:9999'},
{'HTTPS': '1.197.204.143:9999'},
{'HTTPS': '117.91.130.15:9999'},
{'HTTPS': '171.11.179.158:9999'},]
接着在访问中进行引用(代理IP池需读者定期更新),如下所示:
# 本文的url读者可自行设置
url = 'https://www.baidu.com/'
tag = True
while tag:
IP = random.choices(ip_list)[0]
response = requests.get(url , proxies= IP )
if response.status_code == 200:
print(response.text)
tag = False
else:
ip_list.remove(IP)
tag = True
代码写到这里了,这样访问网站时就能将本机的IP掩藏起来,喜欢我的文章的读者可以关注博主哦,我是活动的笑脸。