请求时携带Cookies
def start_requests(self): # 请求时携带Cookies
cookies = '_uuid=AF1BDDAC-262D-B735-E263-8B18B08AA29127233infoc; buvid3=4AF29BB0-2171-4B3D-ABCC-9B3CE12D3CA9190968infoc; LIVE_BUVID=AUTO7215682589272245; sid=jy11c07h; CURRENT_FNVAL=16; stardustvideo=1; rpdid=|(k))JRkl|R)0J'ulY)kl)mlm; UM_distinctid=16daeb9bc981ae-089f2041d12bd6-e343166-1fa400-16daeb9bc997f0; DedeUserID=15089189; DedeUserID__ckMd5=a16d9f0c333118e7; SESSDATA=1fd4d5c3%2C1573274075%2C49615fa1; bili_jct=e5e5393c12980b466c0a6063d523ed72; CURRENT_QUALITY=80; bsource=seo_baidu; CNZZDATA2724999=cnzz_eid%3D1939017832-1570592144-https%253A%252F%252Fsearch.bilibili.com%252F%26ntime%3D1571623586'
cookies = {i.split('=')[0]: i.split('=')[1] for i in cookies.split('; ')}
yield scrapy.Request(self.start_urls[0], cookies=cookies)
发送Post请求模拟登录
scrapy.FormRequest.from_response()使用起来更加简单方便,我们通常只需要提供用户相关信息(账户和密码)即可,scrapy.FormRequest.from_response()将通过模拟点击为我们填充好其他的表单字段并提交表单。
使用scrapy.FormRequest.from_response()模拟登录Github的示例代码:
-- coding: utf-8 --
import scrapy
import re
class GithubLogin2Spider(scrapy.Spider):
name = ‘github_login2’
allowed_domains = [‘github.com’]
start_urls = [‘https://github.com/login’]
def parse(self, response): # 发送Post请求获取Cookies
form_data = {
'login': 'pengjunlee@163.com',
'password': '123456'
}
yield scrapy.FormRequest.from_response(response,formdata=form_data,callback=self.after_login)
def after_login(self,response): # 验证是否请求成功
print(re.findall('Learn Git and GitHub without any code!',response.body.decode()))
**登录需要带着session信息,所以所有请求需要带着session。**