Python3中request模块访问网页以及客户端伪装

在python3中我们使用request模块访问一个网页,可以选择对文件的读写或者urllib.request.urlretrieve()方法将我们浏览的页面保存到本地。
方法1:
url_list=["http://www.bundcredit.com","http://www.baidu.com","http://www.winnerlook.com","http://www.winnertoke.com"]
for urlinfo in url_list:
file=urllib.request.urlopen(urlinfo)
data=file.read()
with open(str(urlinfo).split(".")[1]+".html","wb") as fileinfo:
fileinfo.write(data)

方法2:
filename=urllib.request.urlretrieve("http://www.cniao5.com/course/sz.html",filename=str(fileline)
检查Web服务器Nginx的访问日志:
IP地址 时间 访问方法 访问协议 访问状态等
180.156.222.228 - - [26/Nov/2017:20:02:02 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
模拟浏览器-Headers属性1:
import urllib.request
import re
url="http://www.bundcredit.com"
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
opener = urllib.request.build_opener()
opener.addheaders=[headers]
data=opener.open(url).read()
with open( "1.html", "wb") as fileinfo:
fileinfo.write(data)

伪装后的请求:
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"

模拟浏览器—Headers属性2
url="http://www.bundcredit.com"
req=urllib.request.Request(url)
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
data=urllib.request.urlopen(req).read()
print(data)

转载于:https://blog.51cto.com/dreamlinux/2044474

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值