背景
在实际生产者中,爬虫工程师抓包会碰到get请求转为post请求。
但是,直接使用浏览器中的参数还是不能解决问题。
实例场景
河北公共资源交易网:网址
在获取交易公告和成交公告的列表页时就是发送post请求
模拟测试:
import requests
url = 'http://ggzy.hebei.gov.cn/inteligentsearch/rest/inteligentSearch/getFullTextDataNew'
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36',
}
# 在浏览器上取下的请求参数 是一段字符串
“”“
{"token":"","pn":10,"rn":10,"sdt":"","edt":"","wd":"","inc_wd":"","exc_wd":"","fields":"title","cnum":"001","sort":"{\"showdate\":\"0\"}","ssort":"title","cl":200,"terminal":"","condition":[{"fieldName":"categorynum","isLike":true,"likeType":2,"equal":"003005"}],"time":null,"highlights":"title","statistics":null,"unionCondition":null,"accuracy":"","noParticiple":"0","searchRange":null,"isBusiness":1}
”“”
# 将其进行改造为字典格式,结果如下
post_data={"token":"",
"pn":20,
"rn":10,
"sdt":"",
"edt":"",
"wd":"",
"inc_wd":"",
"exc_wd":"",
"fields":"title",
"cnum":"002",
"sort":"{\"showdate\":\"0\"}",
"ssort":"title",
"cl":200,
"terminal":"",
"condition":[],
"time":'',
"highlights":"title",
"statistics":'',
"unionCondition":[],
"accuracy":"",
"noParticiple":"0",
"searchRange":'',
"isBusiness":1}
res = requests.post(url, data=post_data, headers=header)
print(res.text)
但是还是不能请求成功
尝试及解决方案
# 通过调试发现 ajax代码中存在
# data: JSON.stringify(param),
# 因此需要添加 以下代码,将请求参数序列化
import json
post_data=json.dumps(post_data)
#
最后请求成功
其它情况
# 类似的情况还有直接将其字符串化的情况,而使用序列化反而不行
# 实例如下 四川公共资源交易网
import requests
url='http://ggzyjy.sc.gov.cn/inteligentsearch/rest/inteligentSearch/getFullTextData'
header={
'Cookie': 'JSESSIONID=14FD03DFD67F128373A4E81751FCC2F7; UM_distinctid=16b2bb3332b7d-008cb3bb11904f-3e385d05-1fa400-16b2bb3332cc; userGuid=-254005787; __SDID=e0f1358fd1199801; CNZZDATA1276636503=813357370-1559802761-%7C1566808634',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36',
}
text="""{"token":"",
"pn":300,"rn":12,
"sdt":"2019-7-26 00:00:00","edt":"2019-8-26 23:59:59",
"wd":"","inc_wd":"","exc_wd":"",
"fields":"title","cnum":"","sort":"{'webdate':'0'}",
"ssort":"title","cl":500,"terminal":"",
"condition":[{"fieldName":"categorynum","equal":"002001","notEqual":null,"equalList":null,"notEqualList":null,"isLike":true,"likeType":2}],
"time":null,"highlights":"","statistics":null,"unionCondition":null,"accuracy":"","noParticiple":"0","searchRange":null,"isBusiness":"1"}"""
rs=requests.post(url,data=text,headers=header)
print(rs.text)
当请求头中的content-type为 application/json时,请求的参数需要使用json.dumps()进行处理
这是在实际应用时,碰到的一点问题,希望能够帮到遇到问题的的你们。