Python并行开发指南

最新推荐文章于 2023-09-01 10:00:02 发布

小北的北

最新推荐文章于 2023-09-01 10:00:02 发布

阅读量124

点赞数

文章标签： python 编程语言软件测试 js javascript

本文链接：https://blog.csdn.net/weixin_38739735/article/details/113409760

版权

欢迎关注 “小白玩转Python”，发现更多 “有趣”

引言

在编写代码时，顺序执行可能不是最好的方案。如果下一个任务的输入与上一个任务无关，那么可能就是在浪费时间和CPU。

今天，我们将了解如何使用current.futures库与Python并行执行任务。并通过一个实际操作示例来理解该概念——从多个API端点获取数据。

问题描述

我们的测试任务是访问下面这个网址https://jsonplaceholder.typicode.com/，并连接它的六个端点，以JSON的格式获取数据。因为任务量不是很多，而且Python很可能在一秒钟左右完成这项任务，对于演示多处理能力来说不是很好，所以我们将增加一些内容。

除了获取 API 数据之外，该程序还会在发出请求之间休眠一秒钟。由于有六个端点，程序应该在六秒钟内什么也不做。

让我们首先在没有并行性的情况下测试执行时间。

测试1: 顺序执行任务

代码如下：

import time
import requests


URLS = [
    'https://jsonplaceholder.typicode.com/posts',
    'https://jsonplaceholder.typicode.com/comments',
    'https://jsonplaceholder.typicode.com/albums',
    'https://jsonplaceholder.typicode.com/photos',
    'https://jsonplaceholder.typicode.com/todos',
    'https://jsonplaceholder.typicode.com/users'
]


def fetch_single(url: str) -> None:
    print(f'Fetching: {url}...')
    requests.get(url)
    time.sleep(1)
    print(f'Fetched {url}!')




if __name__ == '__main__':
    time_start = time.time()
    
    for url in URLS:
        fetch_single(url)


    time_end = time.time()
    print(f'\nAll done! Took {round(time_end - time_start, 2)} seconds')

URLS变量中存储了一个API端点列表。我们可以从那里获取数据。fetch_single()函数将向特定的URL发出GET请求并休眠一秒钟，并在函数开始和结束时进行打印输出。

该程序记录了开始和结束时间，通过它们以获得总执行时间。然后在URLS变量的每个URL上调用fetch_single()函数。

下面是程序的运行结果：

可以发现顺序执行的时间大概是7秒。

接下来让我们看看如何使用并行处理来减少执行时间。

测试2: 并行执行任务

代码如下：

import time
import requests
import concurrent.futures


URLS = [
    'https://jsonplaceholder.typicode.com/posts',
    'https://jsonplaceholder.typicode.com/comments',
    'https://jsonplaceholder.typicode.com/albums',
    'https://jsonplaceholder.typicode.com/photos',
    'https://jsonplaceholder.typicode.com/todos',
    'https://jsonplaceholder.typicode.com/users'
]


def fetch_single(url: str) -> None:
    print(f'Fetching: {url}...')
    requests.get(url)
    time.sleep(1)
    print(f'Fetched {url}!')




if __name__ == '__main__':
    time_start = time.time()


    with concurrent.futures.ProcessPoolExecutor() as ppe:
        for url in URLS:
            ppe.submit(fetch_single, url)


    time_end = time.time()
    print(f'\nAll done! Took {round(time_end - time_start, 2)} seconds')

parallel.futures库用于实现基于进程的并行性。在通过其实现并行性时必须使用ProcessPoolExecutor类，它是一个使用进程池异步执行调用的类。

使用with语句可以确保在任务完成后正确清理所有内容。

使用Submit()函数传递要并行执行的任务。第一个参数是函数名称，第二个参数是URL参数。

运行程序后将得到如下输出：