scrapy数据收集器数据远程获取

最新推荐文章于 2024-04-22 15:42:39 发布

fsh_walwal

最新推荐文章于 2024-04-22 15:42:39 发布

阅读量1.8k

点赞数 1

分类专栏： python2 爬虫

本文链接：https://blog.csdn.net/fsh_walwal/article/details/81184055

版权

本文介绍了如何使用Scrapy的数据收集器来实时记录爬虫状态，并通过两种方式实现远程获取这些数据：一是将状态信息保存到Redis并读取；二是利用Telnet Console，配置设置以允许外网访问，通过命令行查看。这种方式适用于多爬虫场景，能有效监控和管理爬虫运行状态。

摘要由CSDN通过智能技术生成

scrapy的数据收集器可以实时记录爬虫状态数据，默认在爬虫结束是打印：

C:\Anaconda2\Lib\site-packages\scrapy\statscollectors.py
class StatsCollector(object):

    def __init__(self, crawler):
        self._dump = crawler.settings.getbool('STATS_DUMP')
        self._stats = {}

    ......

    def close_spider(self, spider, reason):
        if self._dump:
            logger.info("Dumping Scrapy stats:\n" + pprint.pformat(self._stats),
                        extra={'spider': spider})
        self._persist_stats(self._stats, spider)

    def _persist_stats(self, stats, spider):
        pass

上面是数据收集器的源码，可以看到在close_spider中会将self._stats打印出来，默认收集的信息如下。

结束时：
{'downloader/request_bytes': 20646,
 'downloader/request_count': 47,
 'downloader/request_method_count/POST': 47,
 'downloader/response_bytes': 673679,
 'downloader/response_count':

最低0.47元/天解锁文章

fsh_walwal

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
scrapy数据收集器数据远程获取

scrapy的数据收集器可以实时记录爬虫状态数据，默认在爬虫结束是打印：C:\Anaconda2\Lib\site-packages\scrapy\statscollectors.pyclass StatsCollector(object): def __init__(self, crawler): self._dump = crawler.settings.ge...
复制链接

扫一扫