一.scrapy startproject 项目名;并进入项目目录;建立爬虫:scrapy genspider 爬虫名 爬取域名
二.在pycharm中进行编程
1.item文件的编写:需要获取标题,电影演职员信息,评分,简介
import scrapy class MongotestItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() title=scrapy.Field() info=scrapy.Field() content=scrapy.Field() scores=scrapy.Field()
2.编写爬虫文件
import scrapy from mongotest.items import MongotestItem class Test1Spider(scrapy.Spider): name = 'test1' allowed_domains = ['movie.douban.com'] off_set=0 url=