记录一次爬取小说的经历

最新推荐文章于 2021-07-11 23:15:03 发布

weixin_33895604

最新推荐文章于 2021-07-11 23:15:03 发布

阅读量97

点赞数

文章标签： python

原文链接：https://my.oschina.net/longfirst/blog/1549294

版权

2019独角兽企业重金招聘Python工程师标准>>>

每次看小说，都是弹窗广告，烦人，还费流量。

使用了scrapy爬取小说

# coding=utf-8
import scrapy
class UuxsSpider(scrapy.Spider):
    name = "xiaoshuo"
    start_urls = [
        'http://www.xiaoshuo.net/book/0/34/19322.html',

    ]

    def parse(self, response):
        title = response.css('h1#BookTitle::text').extract_first()
        content = response.css('div#BookText::text').extract_first()
        self.log('开始下载 %s' % title.encode('utf8'))
        with open('小说.txt', 'a') as f:
            f.write(title.encode("utf8") + "\n")
            f.write(content.encode("utf8") + "\n")

        next_page = response.css('a#book-next::attr(href)').extract_first()
        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)

转载于:https://my.oschina.net/longfirst/blog/1549294