用Request爬取实战

最新推荐文章于 2024-06-18 10:00:40 发布

Pang_ling

最新推荐文章于 2024-06-18 10:00:40 发布

阅读量528

点赞数

分类专栏： python爬虫学习

本文链接：https://blog.csdn.net/Pang_ling/article/details/105451716

版权

python爬虫学习专栏收录该内容

23 篇文章 0 订阅

订阅专栏

request.Request类

如果想要在请求的时候增加一些请求头，用request.Request。比如要增加一个User-Agent

在拉勾网的职业信息通过另外的网址，再通过JS嵌入到主页面的html代码当中的
真正的职业信息
在这里插入图片描述
在json.cn中解码得

请求页面，还有请求方式为POST

from urllib import request
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=data,method='POST')
resp = request.urlopen(req)
print(resp.read())

在这里插入图片描述
错误显示data得数据类型不对则还要对其进行编码，引进parse得urlencode方法

from urllib import request,parse
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=parse.urlencode(data),method='POST')
resp = request.urlopen(req)
print(resp.read())

POST数据应该为编码后的数据，python3中默认数据类型为utf-8类型，通过encode函数

from urllib import request,parse
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=parse.urlencode(data).encode('utf-8'),method='POST')
resp = request.urlopen(req)
print(resp.read())

在这里插入图片描述
返回一个错误信息，前面有一个b表示为byts数据类型解码decode()成utf-8类型

#解码decode（）
print(resp.read().decode('utf-8'))

在这里插入图片描述
因为用浏览器请求页面可以正常得到数据，则显而易见是电脑破解你是爬虫程序
页面查找ctrl+F
显示函数代码 ctrl+B

Pang_ling

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录