python输出200-500之间的所有奇数_Scrapy 1.8.0返回错误500，但Python代码返回成功200...

最新推荐文章于 2023-04-03 15:59:09 发布

Rainfall若

最新推荐文章于 2023-04-03 15:59:09 发布

阅读量448

点赞数

文章标签： python输出200-500之间的所有奇数

本文链接：https://blog.csdn.net/weixin_28876287/article/details/113679222

版权

本文介绍了在使用Scrapy 1.8.0时遇到500错误的问题，该问题由Scrapy将HTTP头部字段大写导致。通过更新Twisted的Headers类来避免字段大小写转换，从而使Scrapy请求能像Python requests库那样成功获取状态200的结果。

摘要由CSDN通过智能技术生成

我可以通过Python请求库轻松下载此页面：

headers = {

'x-client': 'EXMOOR',

'x-product': 'CITIZENPORTAL',

'x-service': 'PA',

}

url='https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'

resp = requests.get(url, headers=headers)

或者我可以通过CURL轻松下载页面：

curl 'https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12' -H 'x-product: CITIZENPORTAL' -H 'x-service: PA' -H 'x-client: EXMOOR'

它们都返回状态200结果：

{"total":1,"results":[{"id":18468,"reference":"GDO 19/12","proposal":"Prior notification for excavations to bury tanks and trenches to lay water pipes","location":"Land North West of North and South Ley, Exford, Minehead, Somerset.","username":"","applicantSurname":"Mr & Mrs M Burnett","agentName":"JCH Planning Limited","decisionText":null,"registrationDate":"2019-10-04","decisionDate":"2019-10-30","finalGrantDate":null,"appealLodgedDate":null,"appealDecisionDate":null,"areaId":[],"wardId":[],"parishId":[3],"responded":null,"lastLetterDate":null,"targetResponseDate":null}]}

但是Scrapy返回状态500错误：

formdata = {'reference': 'GDO 19/12', }

headers = {

'x-client': 'EXMOOR',

'x-product': 'CITIZENPORTAL',

'x-service': 'PA',

}

fr = scrapy.FormRequest(

url='https://planningapi.agileapplications.co.uk/api/application/search',

method='GET',

meta=response.meta,

headers=headers,

formdata=formdata,

dont_filter=True,

callback=self.ref_result_2,

)

yield fr

也许是因为Scrapy将标头键大写(我尝试过取消大写，但随后Twisted也这样做了-再次将它们大写)，也许是出于其他原因。

如何调整Scrapy 1.8.0代码以成功获得与Python请求相同的结果？

解决方案

确实是由Scrapy将标头字段大写的事实造成的。如果尝试使用大写字母，则在cURL命令中，将得到与Scrapy相同的错误(可以handle_httpstatus_list在Spider类的Scrapy设置中对其进行测试，并response.text在parse方法中进行打印)。就像您已经说过的，Twisted也是一样，因此覆盖scrapy.http.Headers不是解决方案。

但是，您可以按照以下问题做一个技巧，使Twisted不大写特定的标头：

# -*- coding: utf-8 -*-

from pprint import pprint

import scrapy

from twisted.web.http_headers import Headers as TwistedHeaders

TwistedHeaders._caseMappings.update({

b'x-client': b'x-client',

b'x-product': b'x-product',

b'x-service': b'x-service',

})

class Foo(scrapy.Spider):

name = 'foo'

handle_httpstatus_list = [500]

def start_requests(self):

formdata = {'reference': 'GDO 19/12'}

headers = {

'x-client': 'EXMOOR',

'x-product': 'CITIZENPORTAL',

'x-service': 'PA'

}

yield scrapy.FormRequest(

'https://planningapi.agileapplications.co.uk/api/application/search',

method='GET', headers=headers, formdata=formdata, callback=self.parse)

def parse(self, response):

pprint(response.text)

现在您将获得结果。另一方面，根据RFC 7230第3.2节，标头字段应不区分大小写。

Rainfall若

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python输出200-500之间的所有奇数_Scrapy 1.8.0返回错误500，但Python代码返回成功200...

我可以通过Python请求库轻松下载此页面：headers = {'x-client': 'EXMOOR','x-product': 'CITIZENPORTAL','x-service': 'PA',}url='https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'resp...
复制链接

扫一扫