request.Request类
如果想要在请求的时候增加一些请求头,用request.Request。比如要增加一个User-Agent
在拉勾网的职业信息通过另外的网址,再通过JS嵌入到主页面的html代码当中的
真正的职业信息
在json.cn中解码得
请求页面,还有请求方式为POST
from urllib import request
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=data,method='POST')
resp = request.urlopen(req)
print(resp.read())
错误显示data得数据类型不对则还要对其进行编码,引进parse得urlencode方法
from urllib import request,parse
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=parse.urlencode(data),method='POST')
resp = request.urlopen(req)
print(resp.read())
POST数据应该为编码后的数据,python3中默认数据类型为utf-8类型,通过encode函数
from urllib import request,parse
url="https://www.lagou.com/jobs/positionAjax.json?px=default&gx=%E5%AE%9E%E4%B9%A0&city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false&isSchoolJob=1"
#更改头部信息
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':"https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB/p-city_0?px=default&gx=%E5%85%A8%E8%81%8C&gj=&xl=%E6%9C%AC%E7%A7%91&isSchoolJob=1"}
data={'first':'true','pn':1,'kd':'python'}
req = request.Request(url,headers = headers,data=parse.urlencode(data).encode('utf-8'),method='POST')
resp = request.urlopen(req)
print(resp.read())
返回一个错误信息,前面有一个b表示为byts数据类型解码decode()成utf-8类型
#解码decode()
print(resp.read().decode('utf-8'))
因为用浏览器请求页面可以正常得到数据,则显而易见是电脑破解你是爬虫程序
页面查找ctrl+F
显示函数代码 ctrl+B