python反爬--post参数处理

背景

在实际生产者中,爬虫工程师抓包会碰到get请求转为post请求。
但是,直接使用浏览器中的参数还是不能解决问题。

实例场景

河北公共资源交易网:网址
在获取交易公告和成交公告的列表页时就是发送post请求
模拟测试:

import requests

url = 'http://ggzy.hebei.gov.cn/inteligentsearch/rest/inteligentSearch/getFullTextDataNew'

header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36',
}
# 在浏览器上取下的请求参数    是一段字符串
“”“
{"token":"","pn":10,"rn":10,"sdt":"","edt":"","wd":"","inc_wd":"","exc_wd":"","fields":"title","cnum":"001","sort":"{\"showdate\":\"0\"}","ssort":"title","cl":200,"terminal":"","condition":[{"fieldName":"categorynum","isLike":true,"likeType":2,"equal":"003005"}],"time":null,"highlights":"title","statistics":null,"unionCondition":null,"accuracy":"","noParticiple":"0","searchRange":null,"isBusiness":1}
”“”
# 将其进行改造为字典格式,结果如下
post_data={"token":"",
      "pn":20,
      "rn":10,
      "sdt":"",
      "edt":"",
      "wd":"",
      "inc_wd":"",
      "exc_wd":"",
      "fields":"title",
      "cnum":"002",
      "sort":"{\"showdate\":\"0\"}",
      "ssort":"title",
      "cl":200,
      "terminal":"",
      "condition":[],
      "time":'',
      "highlights":"title",
      "statistics":'',
      "unionCondition":[],
      "accuracy":"",
      "noParticiple":"0",
      "searchRange":'',
      "isBusiness":1}
      
res = requests.post(url, data=post_data, headers=header)
print(res.text)

但是还是不能请求成功

尝试及解决方案

#  通过调试发现    ajax代码中存在
#  data: JSON.stringify(param),
# 因此需要添加  以下代码,将请求参数序列化  
import json
post_data=json.dumps(post_data)
#

最后请求成功

其它情况

# 类似的情况还有直接将其字符串化的情况,而使用序列化反而不行
# 实例如下  四川公共资源交易网
import requests
url='http://ggzyjy.sc.gov.cn/inteligentsearch/rest/inteligentSearch/getFullTextData'

header={
'Cookie': 'JSESSIONID=14FD03DFD67F128373A4E81751FCC2F7; UM_distinctid=16b2bb3332b7d-008cb3bb11904f-3e385d05-1fa400-16b2bb3332cc; userGuid=-254005787; __SDID=e0f1358fd1199801; CNZZDATA1276636503=813357370-1559802761-%7C1566808634',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36',
}

text="""{"token":"",
"pn":300,"rn":12,
"sdt":"2019-7-26 00:00:00","edt":"2019-8-26 23:59:59",
"wd":"","inc_wd":"","exc_wd":"",
"fields":"title","cnum":"","sort":"{'webdate':'0'}",
"ssort":"title","cl":500,"terminal":"",
"condition":[{"fieldName":"categorynum","equal":"002001","notEqual":null,"equalList":null,"notEqualList":null,"isLike":true,"likeType":2}],
"time":null,"highlights":"","statistics":null,"unionCondition":null,"accuracy":"","noParticiple":"0","searchRange":null,"isBusiness":"1"}"""

rs=requests.post(url,data=text,headers=header)
print(rs.text)
当请求头中的content-type为 application/json时,请求的参数需要使用json.dumps()进行处理

这是在实际应用时,碰到的一点问题,希望能够帮到遇到问题的的你们。

  • 3
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值