scrapy基本框架 basic爬取小说一部

最新推荐文章于 2024-01-21 22:33:19 发布

wtftx

最新推荐文章于 2024-01-21 22:33:19 发布

阅读量337

点赞数

分类专栏： scrapy 框架文章标签： scrapy basic

本文链接：https://blog.csdn.net/wtftx/article/details/89970900

版权

本文档介绍了在Python 3.7和Scrapy 1.6.0环境下，如何在Windows 10上进行基本的网络爬虫开发。通过实例展示了如何爬取一部小说，包括设置items、pipelines、编写爬虫以及配置settings.py文件，最后验证了异步爬取框架Scrapy的返回结果。

摘要由CSDN通过智能技术生成

python 3.70
scrapy 1.60
windows 10.01

爬取一部小说小说网址

设置items

import scrapy

class Novel1Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()
    chapter_name = scrapy.Field()
    content = scrapy.Field()

设置pipelines

import codecs
import json


class Novel1Pipeline(object):

    def __init__(self):
        print('starting')
        self.file = codecs.open('text_novel1.json', 'w', encoding='utf-8')

    def process_item(self, item, spider):
        json_text = json.dumps(dict

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

wtftx

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy基本框架 basic爬取小说一部

python 3.70scrapy 1.60windows 10.01爬取一部小说小说网址设置itemsimport scrapyclass Novel1Item(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() title = scrap...
复制链接

扫一扫