之前入门了Scrapy,用Scrapy框架爬取豆瓣TOP250,最近打算学习下scrapy-redis分布式爬虫,学习之前再重新温故下Scrapy,这个总结我缩写了很多内容,很多介绍可以看下我之前写的doubanmovie
实战应用
打开CMD输入scrapy startproject maoyan
import scrapy
class MaoyanItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
movie_name = scrapy.Field()
movie_ename = scrapy.Field()
movie_type = scrapy.Field()
movie_publish = scrapy.Field()
movie_time = scrapy.Field()
movie_star = scrapy.Field()
movie_total_price = scrapy.Field()
pass
首先,引入Scrapy
接着,创建一个类,继承自scrapy.item,这个是用来储存要爬下来的数据的存放容器,类似orm的写法
我们要记录的是:电影的名字、电影的评分、电影的上映时间、电影类型、电影英文名
获取网页数据
好了,到这一步编辑spider
from scrapy.spiders import Rule, CrawlSpider
fro