Python 爬虫，多线程爬虫，任务队列Queue

最新推荐文章于 2023-05-27 14:45:00 发布

houyanhua1

最新推荐文章于 2023-05-27 14:45:00 发布

阅读量1.6k

点赞数

分类专栏： Python+ 文章标签： Python 爬虫 Queue 多线程

本文链接：https://blog.csdn.net/houyanhua1/article/details/86494738

版权

本文通过Python的多线程和Queue模块，展示了一个实现高效爬虫的示例代码，详细解释了如何利用任务队列进行爬虫任务的管理和调度，以提高爬取效率。

摘要由CSDN通过智能技术生成

demo.py（多线程爬虫）：

# coding=utf-8
import requests
from lxml import etree
import threading
from queue import Queue

class QiubaiSpdier:
    def __init__(self):
        self.url_temp = "https://www.qiushibaike.com/8hr/page/{}/"
        self.headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"}
        self.url_queue = Queue()
        self.html_queue  = Queue()
        self.content_queue = Queue()
        
    def get_url_list(self):
        for i in range(1,14):
            self.url_queue.put(self.url_temp.format(i))

    def parse_url(self):
        while True:
            url = self.url_queue.