scrapy:请求传参

最新推荐文章于 2023-04-12 10:29:07 发布

缦旋律

最新推荐文章于 2023-04-12 10:29:07 发布

阅读量1.3k

点赞数 1

分类专栏： scrapy

小陈一行一行地敲出来的啦~

本文链接：https://blog.csdn.net/weixin_41391619/article/details/111823998

版权

scrapy 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

使用场景：需要的数据不能在同一个parse函数中获得（比如在起始页面通过parse1函数拿到了压缩包名称和压缩包详细信息的detail_url,要对detail_url发起请求并通过parse2函数解析才能获得压缩包的下载链接download_url）
方法：通过scrapy.Request对detail_url发送请求时，添加一个meta参数：meta={‘item’:item}；将meta传递给Request对应的回调函数（即parse2函数）
爬虫文件中的代码:

import scrapy
from resume1.items import Resume1Item

class ResumeSpiderSpider(scrapy.Spider):
    name = 'resume_spider'
    # allowed_domains = ['https://sc.chinaz.com/jianli/free.html/']
    start_urls = ['https://sc.chinaz.com/jianli/free.html//']

    def parse_detail(self,response): #就相当于例子中的parse2函数
        item = response.meta['item']
        download_url = response.xpath('//*[@id="down"]/div[2]/ul/li[1]/a/@href').extract_first()
        # print(download_url)
        item['download_url'] = download_url

        yield item
        # 此时item里包含了resume_name和对应的download_url，这时才向管道提交item

    def parse(self, response): #就相当于例子中的parse1函数
        info_nodes = response.xpath('//*[@id="container"]/div')
        for node in info_nodes:
            item = Resume1Item()
            name = node.xpath('./p/a//text()').extract_first()
            item['resume_name'] = name
            detail_url = 'https:' + node.xpath('./p/a/@href').extract_first()
            # print(name)
            #对detai_url发起请求，并通过parse_detail解析数据
            yield scrapy.Request(detail_url,callback=self.parse_detail,meta={'item':item})

items.py文件中的代码

import scrapy
class Resume1Item(scrapy.Item):
    resume_name = scrapy.Field()
    download_url = scrapy.Field()

缦旋律

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
scrapy:请求传参

使用场景：需要的数据不能在同一个parse函数中获得（比如在起始页面通过parse1函数拿到了压缩包名称和压缩包详细信息的detail_url,要对detail_url发起请求并通过parse2函数解析才能获得压缩包的下载链接download_url）方法：通过scrapy.Request对detail_url发送请求时，添加一个meta参数：meta={‘item’:item}；将meta传递给Request对应的回调函数（即parse2函数）爬虫文件中的代码:import scra..
复制链接

扫一扫

专栏目录