1.Scrapy框架介绍
scrapy
主要介绍,spiders,engine,scheduler,downloader,Item pipeline
scrapy常见命令如下:
对应在scrapy文件中有,自己增加爬虫文件,系统生成items,pipelines,setting的配置文件就这些。
items写需要爬取的属性名,pipelines写一些数据流操作,写入文件,还是导入数据库中。主要爬虫文件写domain,属性名的xpath,在每页添加属性对应的信息等。
如果有想学习python的程序员,可来我的python学习扣qun:835017344,免费送python的视频教程噢!我每晚上8点还会在群内直播讲解python知识,欢迎大家前来学习交流。
movieRank = scrapy.Field() movieName = scrapy.Field() Director = scrapy.Field() movieDesc = scrapy.Field() movieRate = scrapy.Field() peopleCount = scrapy.Field() movieDate = scrapy.Field() movieCountry = scrapy.Field() movieCategory = scrapy.Field() moviePost = scrapy.Field()
import json class DoubanPipeline(object): def __init__(self): self.f = open("douban.json","w",encoding='utf-8') def proces