不推荐使用scrapy框架发送post请求,配置复杂,如果在数据量大 的情况下,可以通过如下代码来实现:
方法一:就是重写scrapy下面的start_requests方法
scrapy默认发送的是get请求,发送post请求时需要重写start_requests(self)。
import scrapy
class FySpider(scrapy.Spider):
name = 'fy'
# allowed_domains = ['www.baidu.com']
start_urls = ['https://fanyi.baidu.com/sug']
def start_requests(self):
data={
'kw':"beautiful"
}
for url in self.start_urls:
yield scrapy.FormRequest(url=url,formdata=data,callback=self.parse)
def parse(self, response):
print(response.text)
方法二:将URL链接写在外部,然后手动去发送请求
可以写:
scrapy.FormRequest(url=url,formdata=data,callback=self.parse)
也可以这样写:
scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)
# -*- coding: utf-8 -*-
import scrapy
from video.items import VideoItem
class MvSpider(scrapy.Spider):
name = 'mv'
# allowed_domains = ['www.piaohua.com/']
start_urls = ['http://www.88ys.cc/dianying/1.html']
def detail_parse(self,response):
item=response.meta['item']
year=response.xpath('//div[@class="ct-c"]/dl/dd[3]/text()').extract_first()
country = response.xpath('//div[@class="ct-c"]/dl/dd[2]/text()').extract_first()
item['year']=year
item['country'] =country
yield item
def parse(self, response):
li_list=response.xpath('//div[@class="index-area clearfix"]/ul/li/a')
item=VideoItem()
for li in li_list:
m_url='http://www.88ys.cc'+li.xpath('./@href').extract_first()
name=li.xpath('./@title').extract_first()
item['name']=name
yield scrapy.Request(url=m_url,callback=self.detail_parse,meta={'item':item})
FormRequest 与 Request 区别
官方文档如下,在文档中,几乎看不到差别。
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
简单说就是FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的。
class FormRequest(Request):
def __init__(self, *args, **kwargs):
formdata = kwargs.pop('formdata', None)
if formdata and kwargs.get('method') is None:
kwargs['method'] = 'POST'
super(FormRequest, self).__init__(*args, **kwargs)
if formdata:
items = formdata.items() if isinstance(formdata, dict) else formdata
querystr = _urlencode(items, self.encoding)
if self.method == 'POST':
self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
self._set_body(querystr)
else:
self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
###
def _urlencode(seq, enc):
values = [(to_bytes(k, enc), to_bytes(v, enc))
for k, vs in seq
for v in (vs if is_listlike(vs) else [vs])]
return urlencode(values, doseq=1)
最终我们传递的{‘key’: ‘value’, ‘k’: ‘v’}会被转化为’key=value&k=v’ 并且默认的method是POST,再来看看Request。
class Request(object_ref):
def __init__(self, url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None, flags=None):
self._encoding = encoding # this one has to be set first
self.method = str(method).upper()