python中ThreadPoolExecutor线程池

原创已于 2022-03-09 17:11:59 修改 · 3.4k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#python #多线程 #线程同步

于 2021-05-17 11:43:12 首次发布

Python学习专栏收录该内容

13 篇文章

订阅专栏

本文通过一个具体实例展示了如何使用Python标准库concurrent.futures中的ThreadPoolExecutor进行文件读取及处理。主要内容包括创建线程池、利用多线程提高文件处理效率等。

ThreadPoolExecutor

python3标准库concurrent.futures中常用的线程池ThreadPoolExecutor特点：

主线程可以获取某一个线程的状态，以及返回值。
线程同步
让多线程和多进程的编码接口一致。
简单粗暴

上手操练

将使用ThreadPoolExecutor线程池，将文件读取出来，并在文件每一行行末追加内容_我是吊车尾

第一步，假设有个文件，20000行，第一行数据为”1“，后续自增。（直接代码写一个）

from concurrent.futures import ThreadPoolExecutor


index=1
line_list=[]
for i in range(20000):
   line_list.append(index)
   index += 1
with open("./test1.txt","a") as file:
   for line in line_list:
       file.write(str(line)+"\n")

第二步，将第一步中生成的文件读出来存储到list中，并用ThreadPoolExecutor多线程的在每一行的末尾追加内容“_我是吊车尾”

file_line_list = []
with open("./test1.txt", "r")as file:
    for line in file:
        file_line_list.append(line.strip('\n'))

def exec_function(opera_list):
    print("追加", opera_list + "_我是吊车尾")
    return opera_list + "_我是吊车尾"


from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as executor: # 使用with
    res = executor.map(exec_function, file_line_list, timeout=5)
print("追加完成!!!")

with open("./test2.txt", "w")as file:
    for line in res:
        file.write(line + "\n")

PS：建议使用with ThreadPoolExecutor()，如果使用for ThreadPoolExecutor()，在结束时，记得自己executor.shutdown，(with方法内部已经实现了wait(),在使用完毕之后可以自行关闭线程池，减少资源浪费。)

第三步，查看执行结果：

在这里插入图片描述

贴下ThreadPoolExecutor类init方法代码

class ThreadPoolExecutor(_base.Executor):

    # Used to assign unique thread names when thread_name_prefix is not supplied.
    _counter = itertools.count().__next__

    def __init__(self, max_workers=None, thread_name_prefix=''):
        """Initializes a new ThreadPoolExecutor instance.

        Args:
            max_workers: The maximum number of threads that can be used to
                execute the given calls.
            thread_name_prefix: An optional name prefix to give our threads.
        """
        if max_workers is None:
            # Use this number because ThreadPoolExecutor is often
            # used to overlap I/O instead of CPU work.
            max_workers = (os.cpu_count() or 1) * 5
        if max_workers <= 0:
            raise ValueError("max_workers must be greater than 0")

        self._max_workers = max_workers
        self._work_queue = queue.Queue()
        self._threads = set()
        self._shutdown = False
        self._shutdown_lock = threading.Lock()
        self._thread_name_prefix = (thread_name_prefix or
                                    ("ThreadPoolExecutor-%d" % self._counter()))