python3 中requets 使用代理ip进行爬虫

为什么会想写这个称不上技术的技术文章呢?首先是博主前不久没使用好代理ip,浪费了一大笔银子,最近才弄懂python3版本中如何正确使用代理ip。

先看博主之前写的代码吧:

#encoding:utf-8
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='GB18030')
proxie = {"http":"140.143.156.166:1080"}
header = {
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
            "Host":"www.xxx.com",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            "Accept-Encoding":"gzip, deflate, br",
            "DNT":"1",
            "Connection":"keep-alive",
            "Upgrade-Insecure-Requests":"1",
            "Cache-Control":"max-age=0, no-cache",
            "Pragma":"no-cache"
}
res = requests.get(url,headers=header,proxies=proxie)
res.encoding = "utf-8"
print(res.status_code)
print(res.text)

这样子写是无法使用代理ip的,虽然能正常访问网页,但却是用自己的ip地址取访问,并不是用代理ip。很容易封ip,所以那次项目花了博主10天才爬完,浪费了时间也浪费了兜里的银子。所以在网上查阅资料许多资料,代码变成这样:

#encoding:utf-8
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='GB18030')
proxie = {"http":"140.143.156.166:1080","https":"140.143.156.166:1080"}
header = {
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
            "Host":"www.xxx.com",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            "Accept-Encoding":"gzip, deflate, br",
            "DNT":"1",
            "Connection":"keep-alive",
            "Upgrade-Insecure-Requests":"1",
            "Cache-Control":"max-age=0, no-cache",
            "Pragma":"no-cache"
}
res = requests.get(url,headers=header,proxies=proxie)
res.encoding = "utf-8"
print(res.status_code)
print(res.text)

这次代码加上了https proxie,但原程连接一直被拒绝,所以就有了下面最终正确使用proxies的代码:

#encoding:utf-8
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='GB18030')
proxie = {"http":"140.143.156.166:1080","https":"140.143.156.166:1080"}
header = {
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
            "Host":"www.xxx.com",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            "Accept-Encoding":"gzip, deflate, br",
            "DNT":"1",
            "Connection":"keep-alive",
            "Upgrade-Insecure-Requests":"1",
            "Cache-Control":"max-age=0, no-cache",
            "Pragma":"no-cache"
}
res = requests.get(url,verify=False,headers=header,proxies=proxie)
res.encoding = "utf-8"
print(res.status_code)
print(res.text)

加上verify = False,不验证SSL,就好了。这个时候去用爬虫访问网站用的是代理IP地址了!!!

可以开启愉快的爬虫之旅了~~~

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

代码是一生的追求(找工作版)

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值