多线程总结

最新推荐文章于 2022-05-09 22:25:48 发布

chzenable

最新推荐文章于 2022-05-09 22:25:48 发布

阅读量82

点赞数

分类专栏： python linux 文章标签：多线程多进程进程通信进程池

本文链接：https://blog.csdn.net/zhuanju6759/article/details/97236543

版权

python 同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

linux

9 篇文章 0 订阅

订阅专栏

Python的多线程优化，线程，进程，进程池Pool超级好用（服务器的36*2 CPU超级线性，多进程基本上就是翻进程数倍的性能提升）

Python因为有GIL全局锁的缘故，对线程支持不好，但是python的threading类对高io的应用场景还是有极大的提升，因为多Thread之后在GIL遇到IO阻塞的时候，GIL会设置一个监听或者定时，先释放CPU占用，使能其他线程，之后等资源可用的时候再回来进行IO操作。但是感觉Python的最大问题是，这个机制并不对用户透明，不知道实际的调用顺序，可能这边线程sleep，并不知道GIL怎么调度CPU在多个线程里调用的顺序（只知道GIL遇到阻塞的时候会转到其他线程，但是不知道转到哪个线程）

# GIL线程锁
# 采用Pool 后台可以看到fork出多个进程
from multiprocessing import Pool
import time
COUNT = 50000000
def countdown(n):
    while n>0:
        n -= 1
if __name__ == '__main__':
    pool = Pool(processes=2)
    start = time.time()
    r1 = pool.apply_async(countdown, [COUNT//2])
    r2 = pool.apply_async(countdown, [COUNT//2])
    pool.close()
    pool.join()
    end = time.time()
    print('Time taken in seconds -', end - start)

$ python multi.py
Time taken in seconds - 1.874999761581421

# 采用单线程
# single_threaded.py
import time
from threading import Thread
COUNT = 50000000
def countdown(n):
    while n>0:
        n -= 1
start = time.time()
countdown(COUNT)
end = time.time()
print('Time taken in seconds -', end - start)
>>> Time taken in seconds - 2.78125
 
# 采用多线程
# multi_threaded.py
import time
from threading import Thread
COUNT = 50000000
def countdown(n):
    while n>0:
        n -= 1
t1 = Thread(target=countdown, args=(COUNT//2,))
t2 = Thread(target=countdown, args=(COUNT//2,))
start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()
print('Time taken in seconds -', end - start)
>>> Time taken in seconds - 2.859374761581421

采用多进程的时候会对共享变量造成困难 python的multiprocess库支持大量的共享变量方式[共享内存( shared memory ) ]
最常用也是蛮好用的是线程&进程安全的队列方式，如果访问到不可用的块，put行为或者get行为可能会被短暂阻塞（相当于内置一个互斥锁）

使用process创建的进程，使用Queue()
使用进程池创建的进程通信，使用Manager().Queue()

线程池的常用使用方式，线程池的大坑是变量的copy on write机制导致全局变量在被修改的时候会复制，导致oom所以尽可能用参数传递的方式（只传递的用到的变量的部分）减少内存开销。由于Pool默认自带守护进程daemon，所以不用担心主进程结束子进程变成僵尸进程的问题，当然要是想要主进程和子进程无关需要继承multiprocessing.pool类，修改部分函数，注意还有一个基于线程的dummy pool，这个能够让线程和进程快速切换只要在import的时候切换就可以。线程的daemon也是一个道理，关键在主线程结束的时候要不要一起结束子线程还是等待子线程执行结束再退出

pool = Pool(processes=4)              # start 4 worker processes

# 使用一个可迭代对象进行map，有点像是mapreduce的感觉
# print "[0, 1, 4,..., 81]"
print pool.map(f, range(10))

# 随机map print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
    print i

# 异步 evaluate "f(20)" asynchronously
res = pool.apply_async(f, (20,))      # runs in *only* one process
print res.get(timeout=1)              # prints "400"

# evaluate "os.getpid()" asynchronously
res = pool.apply_async(os.getpid, ()) # runs in *only* one process
print res.get(timeout=1)              # prints the PID of that process

# 最常用也是最好用的  直接抽取代码段到函数 然后输入做chunk之后分配到不同的worker 然后结果返回launching multiple evaluations asynchronously *may* use more processes
multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)]
print [res.get(timeout=1) for res in multiple_results]

# make a single worker sleep for 10 secs
res = pool.apply_async(time.sleep, (10,))
try:
    print res.get(timeout=1)
except TimeoutError:
    print "We lacked patience and got a multiprocessing.TimeoutError"

当然python也支持使用管道进行通信和linux中 ‘|’ 的管道是一个原理，但是multiprocessing库支持双向管道（默认）

from multiprocessing import Process, Pipe

def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()   # prints "[42, None, 'hello']"
    p.join()

官网注释： Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.多个线程争抢同一端的管道输入会导致数据损坏

mulitprocessing库基本包含了python的绝大多数多进程操作（包括调用系统函数并连接标准输入输出流管道）Popen可以直接shell（需要配置参数允许系统级别shell）

使用多进程的时候传统print管道会被争抢导致丢失，可以使用logger类进行基于队列的记录（性能损失）

chzenable

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
多线程总结

Python的多线程优化，线程，进程，进程池Pool超级好用（服务器的36*2 CPU超级线性，多进程基本上就是翻进程数倍的性能提升）Python因为有GIL全局锁的缘故，对线程支持不好，但是python的threading类对高io的应用场景还是有极大的提升，因为多Thread之后在GIL遇到IO阻塞的时候，GIL会设置一个监听或者定时，先释放CPU占用，使能其他线程，之后等资源可用的时候再...
复制链接

扫一扫

专栏目录