scrapy的数据收集器可以实时记录爬虫状态数据,默认在爬虫结束是打印:
C:\Anaconda2\Lib\site-packages\scrapy\statscollectors.py
class StatsCollector(object):
def __init__(self, crawler):
self._dump = crawler.settings.getbool('STATS_DUMP')
self._stats = {}
......
def close_spider(self, spider, reason):
if self._dump:
logger.info("Dumping Scrapy stats:\n" + pprint.pformat(self._stats),
extra={'spider': spider})
self._persist_stats(self._stats, spider)
def _persist_stats(self, stats, spider):
pass
上面是数据收集器的源码,可以看到在close_spider中会将self._stats打印出来,默认收集的信息如下。
结束时:
{'downloader/request_bytes': 20646,
'downloader/request_count': 47,
'downloader/request_method_count/POST': 47,
'downloader/response_bytes': 673679,
'downloader/response_count':