cd DouBanTop
scrapy genspider TouTiao movie.douban.com
3-使用pycharm或者vscode打开自己创建的项目名
(1)在settings.py文件里把ROBOTSTXT_OBEY = True改为False
ROBOTSTXT_OBEY =False# Configure a delay for requests for the same website (default: 0)# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay# See also autothrottle settings and docs
DOWNLOAD_DELAY =3#下载延迟3秒# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN =16#线程的并发数#CONCURRENT_REQUESTS_PER_IP = 16
(2)在items.py中创建需要爬取豆瓣电影的字段 如电影标题 电影评分等等
import scrapy
classDoubanItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()
title = scrapy.Field()# 电影名字
score = scrapy.Field()#电影评分
count = scrapy.Field()#评价人数#introduce =scrapy.Field() #电影简介
director = scrapy.Field()#电影导演