scrapy框架的使用

最新推荐文章于 2020-04-13 20:33:43 发布

LINPAOMO

最新推荐文章于 2020-04-13 20:33:43 发布

阅读量119

点赞数

文章标签： python 爬虫

原文链接：https://www.jianshu.com/p/8d81aa6cfc76

版权

安装scrapy

pip3 install scrapy

创建项目

（1）创建项目文件夹

scrapy startproject projectname

（2）进入projectname文件夹

cd projectname

（3）创建爬虫文件

scrapy genspider fear_and_greed '网址'

结构目录大概如下图：
在这里插入图片描述
3. 编写scrapy爬虫项目
（1）编写items.py

编写我们需要爬取的数据字段

import scrapy


class ExponentItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    timestamp = scrapy.Field()
    value = scrapy.Field()
    ......

编写fear_and_greed.py文件

这个文件里面主要写的是我们爬虫的解析、主要逻辑

import scrapy


class FearAndGreedSpider(scrapy.Spider):
    name = 'fear_and_greed'
    allowed_domains = ['http://xxxxxxxxxx.com']
    start_urls = ['http://xxxxxxxxxx.com']

    def parse(self, response):
        pass

编写pipelines.py文件

这里是管道文件，就是我们采集到的数据通过yield信号，传递到这里，进行后续操作，比如存入数据库，写文件…

打开settings.py 文件更改配置

# 将ROBOTSTXT_OBEY 中的True修改成False
# 默认是True，遵守robots.txt文件中的协议，遵守允许爬取的范围。
# 设置为False，是不遵守robo协议文件
ROBOTSTXT_OBEY = False

启用管道
在这里插入图片描述

执行scrapy项目

scrapy crawl fear_and_greed

LINPAOMO

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy框架的使用

安装scrapypip3 install scrapy创建项目（1）创建项目文件夹scrapy startproject projectname（2）进入projectname文件夹cd projectname（3）创建爬虫文件scrapy genspider fear_and_greed '网址'结构目录大概如下图：3. 编写scrapy爬虫项目（1）编写...
复制链接

扫一扫