文章目录
1 问题描述
2 方案详解
经过一番测试,发现导致scrapy post请求报错的原因主要有headers、formdata
两类。
2.1 header
一般可以从浏览器network直接获取到相关headers参数,但也有存在隐藏情形,可通过requests.post(url, json={}).headers
查看实际参数。
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml,application/json;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Content-Type': 'application/json;charset=UTF-8', # charset=UTF-8时常成为问题的关键
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
}
2.2 formdata
formdata包括formdata参数设置和formdata引用两方面
2.2.1 formdata参数
scrapy相较于requests需要将所有参数转换为str或byte对象,否则提示TypeError: to_bytes must receive a str or bytes object, got int
或者TypeError: to_bytes must receive a str or bytes object, got nonetype
等(参考实战记录案例)。
2.2.2 formdata引用
# 第一类: FormRequest
yield scrapy.FormRequest(url=linkurl, formdata=payload, callback=self.parse)
# 第一类: Request
yield scrapy.Request(url=linkurl, callback=self.parse, method='POST', body=json.dumps(payload))
3 实战记录
3.1 origin
payload = {
"token":"",
"pn":0,
"rn":10,
"sdt":"",
"edt":"",
"wd":"",
"inc_wd":"",
"exc_wd":"",
"fields":"title,projectnum",
"cnum":"001",
"sort":"{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
"ssort":"title",
"cl":200,
"terminal":"",
"condition":[{"fieldName":"categorynum","isLike":true,"likeType":2,"equal":"001004003"}],
"time":null,
"highlights":"title",
"statistics":null,
"unionCondition":null,
"accuracy":"100",
"noParticiple":"0",
"searchRange":null,
"isBusiness":1
}
3.2 request
payload = {
"token": "",
"pn": 0,
"rn": 10,
"sdt": "",
"edt": "",
"wd": "",
"inc_wd": "",
"exc_wd": "",
"fields": "title,projectnum",
"cnum": "001",
"sort": "{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
"ssort": "title",
"cl": 200,
"terminal": "",
"condition": [{"fieldName": "categorynum", "isLike": True, "likeType": 2, "equal": "001004003"}],
"time": None,
"highlights": "title",
"statistics": None,
"unionCondition": None,
"accuracy": "100",
"noParticiple": "0",
"searchRange": None,
"isBusiness": 1
}
3.3 scrapy
payload = {
"token": "",
"pn": "0",
"rn": "10",
"sdt": "",
"edt": "",
"wd": "",
"inc_wd": "",
"exc_wd": "",
"fields": "title,projectnum",
"cnum": "001",
"sort": "{\"infodatepx\":\"0\",\"infoid\":\"1\"}",
"ssort": "title",
"cl": "200",
"terminal": "",
"condition": '[{"fieldName": "categorynum", "isLike": True, "likeType": 2, "equal": "001004003"}]',
"time": "",
"highlights": "title",
"statistics": "",
"unionCondition": "",
"accuracy": "100",
"noParticiple": "0",
"searchRange": "",
"isBusiness": "1"
}