随笔1-scrapy

最新推荐文章于 2021-03-26 18:07:27 发布

qq_35169173

最新推荐文章于 2021-03-26 18:07:27 发布

阅读量146

点赞数

分类专栏：随笔

随笔专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Creating a new Scrapy project
Writing a spider to crawl a site and extract data
Exporting the scraped data using the command line
Changing spider to recursively follow links
Using spider arguments

1.创建一个scrapy的项目

scrapy startproject tutorial

2.写一个爬虫，用来爬取网站和扩展数据

quotes_spider.py

3.使用命令行导出爬取的数据

scrapy crawl quotes

4.修改爬虫迭代链接（？？？）

5.使用爬虫参数（？？？？）

scrapy shell 'http://quotes.toscrape.com/page/1/'

爬虫代码：

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        url = 'http://quotes.toscrape.com/'
        tag = getattr(self, 'tag', None)
        if tag is not None:
            url = url + 'tag/' + tag
        yield scrapy.Request(url, self.parse)

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
            }

        next_page = response.css('li.next a::attr(href)').extract_first()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

qq_35169173

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
随笔1-scrapy

Creating a new Scrapy projectWriting a spider to crawl a site and extract dataExporting the scraped data using the command lineChanging spider to recursively follow linksUsing spider arguments1.创建
复制链接

扫一扫