通过stats属性来使用数据收集器。
数据收集使用,统计名人名言网站中(http://quotes.toscrape.com/)标签为love的名言数量
1.创建项目
>>>scrapy startproject tagcount
2.创建爬虫
>>>scrapy genspider tags quotes.toscrape.com
3、编写item.py文件
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
import scrapy
class TagcountItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
author = scrapy.Field()
content = scrapy.Field()
tag = scrapy.Field()
4、编写tags.py文件
import scrapy
from tagcount.items import TagcountItem
from scrapy import Request
class TagsSpider(scrapy.Spider):
name = 'tags'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
quotes = response.css('.quote')
for quote in quotes:
item = TagcountItem()
item['author'] = quote.css('.author::text').extract_first()
item['content'] = quote.css('.text::text').extract_first()
item['tag'] = quote.css('.tag::text').extract()
if 'love' in item['tag']:
# 如果“love”在获取的tag内容里,则“love”统计数量+1
self.crawler.stats.inc_value('love')
yield item
next_page = response.css('.next>a::attr(href)').extract_first()
if next_page is not None:
yield Request(response.urljoin(next_page), callback=self.parse)
5、输入scrapy crawl tags运行爬虫,结果如下