手写myscrapy(八)

项目地址:https://gitee.com/wyu_001/myscrapy
接下来接着说明如何多线程运行多个爬虫脚本:
项目的根目录下有个batch.py文件,这个就是批量运行多个爬虫的脚本,这里使用了线程池,同时运行spider下的多个爬虫类,也可以在setting.py文件中设置运行的爬虫文件:

#batch
#批量运行默认情况下运行spider下继承myspider类的子类
#批量运行脚本参数定义,一次并发线程数

BATCH_THREADS =10

#batch run files in list
#自定义运行spider下脚本文件
BATCH_FILES =['dxyqueryhospital.py',
              'haodfqueryhospital.py'
              ]

下面是batch.py脚本代码:

import inspect
from os import listdir,getcwd
from os.path import isfile,join
import importlib

from config.setting import BATCH_THREADS
from config.setting import BATCH_FILES

from concurrent.futures import ThreadPoolExecutor,as_completed

crawls=[]

lib_dir = "spider"
file_path = join(getcwd(),lib_dir)
crawl_files = [ f for f in listdir(file_path) if isfile(join(file_path,f))]

crawls_sets = set(crawl_files)
batch_sets = set(BATCH_FILES)


if len(batch_sets):
    crawl_files = list(crawls_sets.intersection(batch_sets))

for file in crawl_files:

    if file != "__init__.py" :
        file = f'.{file.split(".")[0]}'
        moudle = importlib.import_module(file,lib_dir)

        for name ,obj in inspect.getmembers(moudle,inspect.isclass):
            if obj.__base__.__name__ == "MySpider":
                crawls.append(obj())

thread_num = 0

tasks = []

with ThreadPoolExecutor(max_workers= BATCH_THREADS) as tp:

    while(len(crawls)):
        task = tp.submit(crawls.pop().start_request)
        tasks.append(task)

        thread_num += 1
        if thread_num >= BATCH_THREADS :
            for future in as_completed(tasks):
                finish = future.result()

            thread_num  = 0

    for future in as_completed(tasks):
        finish = future.result()

  • 17
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

semicolon_helloword

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值