爬虫中实现翻页（队列实现）

最新推荐文章于 2024-05-01 21:58:14 发布

Tramp_fish

最新推荐文章于 2024-05-01 21:58:14 发布

阅读量631

点赞数

分类专栏： # 爬虫 python

本文链接：https://blog.csdn.net/Tramp_fish/article/details/106144103

版权

python 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

爬虫

5 篇文章 0 订阅

订阅专栏

在爬取列表页时，通常我们需要翻页，最简单的翻页实现是递归调用，伪代码如下

def crawl_list(url):
    next_url = crawl(url) #process html data ,extract next url 
    if next_url is not None:
        crawl_list(next_url)

此种方式存在的问题是：

1.递归次数过多，会抛出RuntimeError: maximum recursion depth exceeded while calling a Python object

2.运行程序占用内存过多

改进代码：

def crawl_list(urls):
    for start_url in urls:
        queue = [start_url]
        while queue:
            next_url =queue.pop(0)
            next_url = crawl(next_url)
            if next_url is not None:
                queue.append(next_url)

通过列表维护一个fifo的队列，消除递归调用带来的问题

Tramp_fish

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬虫中实现翻页（队列实现）

在爬取列表页时，通常我们需要翻页，最简单的翻页实现是递归调用，伪代码如下def crawl_list(url): next_url = crawl(url) #process html data ,extract next url if next_url is not None: crawl_list(next_url)此种方式存在的问题是： 1.递归次数过多，会抛出RuntimeError: maximum recursion depth excee...
复制链接

扫一扫

专栏目录