Scrapy 框架里爬取多页数据 yield 卡住的问题

最新推荐文章于 2023-07-10 01:01:51 发布

rt95

最新推荐文章于 2023-07-10 01:01:51 发布

阅读量1k

点赞数

分类专栏：编程

本文链接：https://blog.csdn.net/Charlieputh95/article/details/104283619

版权

编程专栏收录该内容

2 篇文章 1 订阅

订阅专栏

1@ 前言

漫长的寒假必须找点事情来做，刚好换了系统，准备将笔记全部从石墨腾到印象。在复习 scrapy 框架的时候, 遇到了一个问题，就是在爬取多个页面的换页时候，要进行 yield 操作来进行下一个页面的爬取，但是爬取完第一页后，老是卡在那里，也不报错，就是单纯的卡，也花费了一些时间解决，故在此记录下。

2@ 过程

首先上我出问题的代码段：

class PoemSpider(scrapy.Spider):
    name = 'poem'
    allowed_domains = ['gushiwen.org']
    start_urls = ['https://www.gushiwen.org/default_1.aspx']
    base_url = "https://gushiwen.org"
    # 更多的 response 的方法，可以跟进查看 py 源码
    def parse(self, response):
      ......
        for x in range(10):
                print("@"*100)
                print(titles[x])
                item["title"] =  titles[x]
                item["author"] = authors[x]
                item["dynasty"] =  dynasties[x]
                item["contents"] = contents[x]
                encoder.encode(item)
                yield item

              
        next_url = response.xpath("//a[@id='amore']/@href").get()
        print("@"*100)
        print(next_url)
        if not next_url:
            return
        else:
            print("test2")
            yield scrapy.Request(url=self.base_url+next_url, callback=self.parse,dont_filter=True)