python爬虫(三):填坑Scrapy POST请求

1 问题描述

2 方案详解

经过一番测试,发现导致scrapy post请求报错的原因主要有headers、formdata两类。

2.1 header

一般可以从浏览器network直接获取到相关headers参数,但也有存在隐藏情形,可通过requests.post(url, json={}).headers查看实际参数。

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml,application/json;q=0.9,*/*;q=0.8',
  'Accept-Language': 'zh-CN,zh;q=0.9',
  'Content-Type': 'application/json;charset=UTF-8', # charset=UTF-8时常成为问题的关键
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
}

2.2 formdata

formdata包括formdata参数设置和formdata引用两方面

2.2.1 formdata参数

scrapy相较于requests需要将所有参数转换为str或byte对象,否则提示TypeError: to_bytes must receive a str or bytes object, got int或者TypeError: to_bytes must receive a str or bytes object, got nonetype等(参考实战记录案例)。

2.2.2 formdata引用

# 第一类: FormRequest
yield scrapy.FormRequest(url=linkurl, formdata=payload, callback=self.parse)
# 第一类: Request
yield scrapy.Request(url=linkurl, callback=self.parse, method='POST', body=json.dumps(payload))

3 实战记录

3.1 origin

payload = {
			"token":"", 
			"pn":0,
			"rn":10,
			"sdt":"",
			"edt":"",
			"wd":"",
			"inc_wd":"",
			"exc_wd":"",
			"fields":"title,projectnum",
			"cnum":"001",
			"sort":"{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
			"ssort":"title",
			"cl":200,
			"terminal":"",
			"condition":[{"fieldName":"categorynum","isLike":true,"likeType":2,"equal":"001004003"}],
			"time":null,
			"highlights":"title",
			"statistics":null,
			"unionCondition":null,
			"accuracy":"100",
			"noParticiple":"0",
			"searchRange":null,
			"isBusiness":1
			}

3.2 request

payload = {
			"token": "", 
			"pn": 0, 
			"rn": 10, 
			"sdt": "", 
			"edt": "", 
			"wd": "", 
			"inc_wd": "", 
			"exc_wd": "",
		    "fields": "title,projectnum", 
		    "cnum": "001", 
		    "sort": "{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
		    "ssort": "title", 
		    "cl": 200, 
		    "terminal": "",
		    "condition": [{"fieldName": "categorynum", "isLike": True, "likeType": 2, "equal": "001004003"}],
		    "time": None, 
		    "highlights": "title", 
		    "statistics": None, 
		    "unionCondition": None, 
		    "accuracy": "100",
		    "noParticiple": "0", 
		    "searchRange": None, 
		    "isBusiness": 1
			}

3.3 scrapy

payload = {
			"token": "", 
			"pn": "0", 
			"rn": "10", 
			"sdt": "", 
			"edt": "", 
			"wd": "", 
			"inc_wd": "", 
			"exc_wd": "",
		    "fields": "title,projectnum", 
		    "cnum": "001", 
		    "sort": "{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
		    "ssort": "title", 
		    "cl": "200", 
		    "terminal": "",
		    "condition": '[{"fieldName": "categorynum", "isLike": True, "likeType": 2, "equal": "001004003"}]',
		    "time": "", 
		    "highlights": "title", 
		    "statistics": "", 
		    "unionCondition": "", 
		    "accuracy": "100",
		    "noParticiple": "0", 
		    "searchRange": "", 
		    "isBusiness": "1"
			}
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值