笔记-scrapy-signal
1. scrapy singal
1.1. 信号机制
scrapy的信号机制主要由三个模块完成
signals.py 定义信号量
signalmanager.py 管理
utils/signal.py 真正干活的
scrapy自带一些内置的信号,定义在signals.py下:
engine_started = object()
engine_stopped = object()
spider_opened = object()
spider_idle = object()
spider_closed = object()
spider_error = object()
request_scheduled = object()
request_dropped = object()
response_received = object()
response_downloaded = object()
item_scraped = object()
item_dropped = object()
# for backwards compatibility
stats_spider_opened = spider_opened
stats_spider_closing = spider_closed
stats_spider_closed = spider_closed
item_passed = item_scraped
request_received = request_scheduled
scrapy定义了这些信号,并在相关时刻触发信号,下面就是其中一个案例:
yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
至于这些信号的含义和触发时刻参考文档:https://docs.scrapy.org/en/latest/topics/signals.html
1.2. scrapy信号使用
scrapy已经定义了常用的信号,开发人员可以在扩展类/spider/pipeline中对这些信号做关联。
下面是一个扩展类中使用信号的例子:spider_open_s.py
#coding:utf-8
import logging
from scrapy import signals
logger = logging.getLogger(__name__)
class spider_open(object):
@classmethod
def from_crawler(cls, crawler):
ext = cls()
crawler.signals.connect(ext.spider_open_log, signal=signals.spider_opened)
return ext
def spider_open_log(self, spider):
logger.info('spider is opened!')
input('input a number to go on:')
非常简单,希望在spider打开后有一个提示或操作,那么在扩展类中将spider_opened信号与要进行的操作函数关联起来,scrapy在初始化spider时会触发spider_opened信号,然后执行关联的函数。
1.3. signal深入
scrapy的信号处理底层使用的是dispatch模块:
from pydispatch import dispatcher
如果想要更细致的操作信号,scrapy也提供了接口,scrapy是通过signalmanager类操作信号的:
classscrapy.signalmanager.SignalManager(sender=_Anonymous)
常用方法
- connect(receiver, signal, **kwargs)
Connect a receiver function to a signal.
The signal can be any object, although Scrapy comes with some predefined signals that are documented in the Signals section.
Parameters: | receiver (callable) – the function to be connected signal (object) – the signal to connect to |
- disconnect(receiver, signal, **kwargs)
Disconnect a receiver function from a signal. This has the opposite effect of the connect()method, and the arguments are the same.
- disconnect_all(signal, **kwargs)
Disconnect all receivers from the given signal.
Parameters: | signal (object) – the signal to disconnect from |
- send_catch_log(signal, **kwargs)
Send a signal, catch exceptions and log them.
The keyword arguments are passed to the signal handlers (connected through the connect()method).
- send_catch_log_deferred(signal, **kwargs)
Like send_catch_log() but supports returning deferreds from signal handlers.
Returns a Deferred that gets fired once all signal handlers deferreds were fired. Send a signal, catch exceptions and log them.
The keyword arguments are passed to the signal handlers (connected through the connect()method).