scrapy之参数传递和启动

scrapy之参数传递和启动

start_requests

scrapy可以通过设计start_requests函数来自定义函数的启动流程,比如从某个链接启动,或启动时传递某些特定值
如果不需要自定义,只需要用某个链接开始传递,可以不定义start_requests函数,通过start_urls列表进行启动
from typing import Any

import scrapy
from scrapy.http import Response

# 使用start_requests函数
class BaiDuApi(scrapy.Spider):
    name = 'baiduapi'

    def start_requests(self):
        urls = [
            'https://httpbin.org/get?params=1',
            'https://httpbin.org/get?params=2'
        ]
        for url in urls:
            yield scrapy.Request(
                url
            )

    def parse(self, response: Response, **kwargs: Any) -> Any:
        pass
from typing import Any

import scrapy
from scrapy.http import Response

# 使用start_urls,效果与上面代码一致
class BaiDuApi(scrapy.Spider):
    name = 'baiduapi'

    start_urls = [
        'https://httpbin.org/get?params=1',
        'https://httpbin.org/get?params=2'
    ]

    def parse(self, response: Response, **kwargs: Any) -> Any:
        pass

scrapy参数传递

scrapy可以在启动时加入-a参数,-a可以将某些参数传递到待执行的spider中
import scrapy


class WangYiNew(scrapy.Spider):
    name = 'wangyinews'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
    }

    def start_requests(self):
        article = getattr(self, "article", None)
        if article is not None:
            base_url = "https://www.163.com/news/article/" + article + '.html'
            yield scrapy.Request(
                base_url,
                headers=self.headers
            )

    def parse(self, response):
        items = {
            'title':response.xpath("//h1[@class='post_title']/text()").get(),
            'content':''.join(response.xpath("//div[@class='post_body']//p//text()").getall()),
            'pubtime':''.join(response.xpath("//div[@class='post_info']/text()").getall())
        }
        self.log(items)
        yield items
scrapy crawl wangyinews -a article = IFJ1RHSS000189FH
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值