scrapy 学习 001

最新推荐文章于 2021-03-24 00:05:08 发布

梦晓时分

最新推荐文章于 2021-03-24 00:05:08 发布

阅读量210

点赞数

分类专栏：爬虫 python

本文链接：https://blog.csdn.net/weixin_44274895/article/details/103506188

版权

爬虫同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

python

3 篇文章 0 订阅

订阅专栏

爬取笔趣阁小说目录练习

安装 scrapy scrapy startproject tutorial
创建 items 结构

class Biquge(scrapy.Item):
    title = scrapy.Field()
    href = scrapy.Field()

创建 spider scrapy genspider biquge www.xbiquge.la/xiaoshuodaquan
编辑 spider

# -*- coding: utf-8 -*-
import scrapy

from tutorial.items import Biquge

class BiqugeSpider(scrapy.Spider):
    name = 'biquge'
    allowed_domains = ['www.xbiquge.la/xiaoshuodaquan']
    start_urls = ['http://www.xbiquge.la/xiaoshuodaquan/']

    def parse(self, response):
        items = []
        for sel in response.xpath('//div[@class="novellist"]/ul/li'):
            item = Biquge()
            item['title'] = sel.xpath('a/text()').extract_first().strip()
            item['href'] = sel.xpath('a/@href').extract_first().strip()
            items.append(item)
        return items

爬取并写入 json 文件 scrapy crawl biquge -o biquge.json
result

[
{"title": "牧神记", "href": "http://www.xbiquge.la/15/15409/"},
{"title": "终极斗罗", "href": "http://www.xbiquge.la/7/7931/"},
.
.
.
{"title": "废土巫师", "href": "http://www.xbiquge.la/0/874/"},
{"title": "我的玉雕不正常", "href": "http://www.xbiquge.la/25/25679/"}
]

下一步目标写入 mongodb

梦晓时分

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy 学习 001

爬取笔趣阁小说目录练习安装 scrapy scrapy startproject tutorial 创建 items 结构class Biquge(scrapy.Item): title = scrapy.Field() href = scrapy.Field()创建 spider scrapy genspider biquge www.xbiquge.la/xia...
复制链接

扫一扫