1.创建scrapy项目
dos窗口输入:
scrapy startproject quote
cd quote
2.编写item.py文件(相当于编写模板,需要爬取的数据在这里定义)
import scrapy
class QuoteItem(scrapy.Item):
# define the fields for your item here like:
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
3.创建爬虫文件
dos窗口输入:
scrapy genspider myspider quotes.toscrape.com
4.编写myspider.py文件(接收响应,处理数据)
# -*- coding: utf-8 -*-
import scrapy
from quote.items import QuoteItem
class MyspiderSpider(scrapy.Spider):
name = 'myspider'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
for each in response.xpath('//div[@class="quote