ThreadPoolExecutor
python3标准库concurrent.futures中常用的线程池ThreadPoolExecutor特点:
- 主线程可以获取某一个线程的状态,以及返回值。
- 线程同步
- 让多线程和多进程的编码接口一致。
简单粗暴
上手操练
将使用ThreadPoolExecutor线程池,将文件读取出来,并在文件每一行行末追加内容_我是吊车尾
第一步,假设有个文件,20000行,第一行数据为”1“,后续自增。(直接代码写一个)
from concurrent.futures import ThreadPoolExecutor
index=1
line_list=[]
for i in range(20000):
line_list.append(index)
index += 1
with open("./test1.txt","a") as file:
for line in line_list:
file.write(str(line)+"\n")
第二步,将第一步中生成的文件读出来存储到list中,并用ThreadPoolExecutor多线程的在每一行的末尾追加内容“_我是吊车尾”
file_line_list = []
with open("./test1.txt", "r")as file:
for line in file:
file_line_list.append(line.strip('\n'))
def exec_function(opera_list):
print("追加", opera_list + "_我是吊车尾")
return opera_list + "_我是吊车尾"
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as executor: # 使用with
res = executor.map(exec_function, file_line_list, timeout=5)
print("追加完成!!!")
with open("./test2.txt", "w")as file:
for line in res:
file.write(line + "\n")
PS:建议使用with ThreadPoolExecutor(),如果使用for ThreadPoolExecutor(),在结束时,记得自己executor.shutdown,(with方法内部已经实现了wait(),在使用完毕之后可以自行关闭线程池,减少资源浪费。)
第三步,查看执行结果:


贴下ThreadPoolExecutor类init方法代码
class ThreadPoolExecutor(_base.Executor):
# Used to assign unique thread names when thread_name_prefix is not supplied.
_counter = itertools.count().__next__
def __init__(self, max_workers=None, thread_name_prefix=''):
"""Initializes a new ThreadPoolExecutor instance.
Args:
max_workers: The maximum number of threads that can be used to
execute the given calls.
thread_name_prefix: An optional name prefix to give our threads.
"""
if max_workers is None:
# Use this number because ThreadPoolExecutor is often
# used to overlap I/O instead of CPU work.
max_workers = (os.cpu_count() or 1) * 5
if max_workers <= 0:
raise ValueError("max_workers must be greater than 0")
self._max_workers = max_workers
self._work_queue = queue.Queue()
self._threads = set()
self._shutdown = False
self._shutdown_lock = threading.Lock()
self._thread_name_prefix = (thread_name_prefix or
("ThreadPoolExecutor-%d" % self._counter()))
其他
本文只涉及了最基本的应用,ThreadPoolExecutor中其实还有更多的参数可以使用,大家有兴趣可以继续深入。
corePoolSize:核心线程池的线程数量
maximumPoolSize:最大的线程池线程数量
keepAliveTime:线程活动保持时间,线程池的工作线程空闲后,保持存活的时间。
unit:线程活动保持时间的单位。
workQueue:指定任务队列所使用的阻塞队列
本文通过一个具体实例展示了如何使用Python标准库concurrent.futures中的ThreadPoolExecutor进行文件读取及处理。主要内容包括创建线程池、利用多线程提高文件处理效率等。
1784

被折叠的 条评论
为什么被折叠?



