python中的多线程与多进程

最新推荐文章于 2023-11-16 15:55:07 发布

还是转转

最新推荐文章于 2023-11-16 15:55:07 发布

阅读量238

点赞数

分类专栏： python 文章标签： python多线程 python多进程

本文链接：https://blog.csdn.net/xiaoyi52/article/details/100187811

版权

python 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

进程是操作系统资源分配的基本单位，而线程是任务调度和执行的基本单位。

在Java中比较关注多线程，而在Python中，与多线程相比，可能更关注多进程。

1 GIL

在非python环境中，单核情况下，同时只能有一个任务执行。多核时可以支持多个线程同时执行。但是在python中，无论有多少核，同时只能执行一个线程。究其原因，这就是由于GIL的存在导致的。

GIL的全称是Global Interpreter Lock(全局解释器锁)，来源是python设计之初的考虑，为了数据安全所做的决定。某个线程想要执行，必须先拿到GIL，我们可以把GIL看作是“通行证”，并且在一个python进程中，GIL只有一个。拿不到通行证的线程，就不允许进入CPU执行。

GIL只在cpython中才有，因为cpython调用的是c语言的原生线程，所以它不能直接操作cpu，只能利用GIL保证同一时间只能有一个线程拿到数据。而在pypy和jpython中是没有GIL的。通常使用的就是cpython。

根据python的这个特点，可以得出一个结论：python多线程适用于io密集型代码，而不适用于cpu密集型代码。
原因是，io密集型代码大多数时间消耗在io等待上，而cpu使用率不高，因此可以开启多线程来使cpu得到最大化利用。但是对于cpu密集型代码，cpu本身的利用率就很高，再开启多线程来竞争cpu，提升的效率不足以抵消线程调度带来的资源消耗。

python下想要充分利用多核CPU，就用多进程。因为每个进程有各自独立的GIL，互不干扰，这样就可以真正意义上的并行执行，在python中，多进程的执行效率优于多线程(仅仅针对多核CPU而言)。

2 多线程

python通过threading.Thread类来创建线程对象。

import threading

# 线程任务，对全局变量num做+1操作
def run(n):
    global num
    num += 1


num = 0
t_obj = []

for i in range(20000):
    t = threading.Thread(target=run, args=("t-%s" % i,))
    t.start()
    t_obj.append(t)

for t in t_obj:
    t.join()

print "num:", num

主要的线程方法说明如下：

t = threading.Thread(target=你写的函数名,args=(传入变量(如果只有一个变量就必须在后加上逗号),),name=随便取一个线程名)：把一个线程实例化给t，这个线程负责执行target指定的线程方法
t.start()：负责执行启动这个线程
t.join()：必须要等待子线程执行完成后再执行主线程
t.setDeamon(True): 设置为守护线程。当主线程执行完毕后，不管子线程有没有执行完成都退出主程序，注意不能和t.join()一起使用。
threading.current_thread().name:打印出线程名

与Java一样，也有线程安全问题。在python2.x下执行，最终num结果可能为2000，也可能为19999、19998或者19997等等。也就是说，这段示例代码是线程不安全的。
解决线程安全的方法与Java类似，最常用的是阻塞式线程同步，即加锁。

# encoding: utf8
import threading


# 线程任务，对全局变量num做+1操作
def run(n):
    global num
    if lock.acquire():
        print "线程%s打印%s\n" % (n, num)
        num += 1
        lock.release()


num = 0
t_obj = []
lock = threading.Lock()

for i in range(20000):
    t = threading.Thread(target=run, args=("t-%s" % i,))
    t.start()
    t_obj.append(t)

for t in t_obj:
    t.join()

print "num:", num

另一种创建多线程的方式是继承threading.Thread类，如下：

# encoding: utf8
import threading


class Task(threading.Thread):
    def __init__(self, n):
        super(Task, self).__init__()
        self.n = n

    # 线程任务，对全局变量num做+1操作
    def run(self):
        global num
        lock.acquire()
        print "线程%s打印%s\n" % (self.n, num)
        num += 1
        lock.release()


num = 0
t_obj = []
lock = threading.Lock()

for i in range(20000):
    t = Task(i)
    t.start()
    t_obj.append(t)

for t in t_obj:
    t.join()

2.1 主线程与子线程

在python中，当一个进程启动之后，默认会产生一个主线程。当使用多线程时，主线程会创建多个子线程。默认情况下子线程是以非守护线程方式启动的。当主线程执行完自己的任务后退出，此时子线程会继续执行自己的任务，直到子线程结束。

如果以守护线程的方式创建子线程，则当主线程结束时，子线程也随之结束。如下：

# encoding: utf8
import threading
import time


# 线程任务，对全局变量num做+1操作
def run(n):
    global num
    if lock.acquire():
        print "线程%s打印%s\n" % (n, num)
        time.sleep(0.1)
        num += 1
        lock.release()


num = 0
t_obj = []
lock = threading.Lock()

for i in range(2000):
    t = threading.Thread(target=run, args=("t-%s" % i,))
    t.setDaemon(True)
    t.start()
    t_obj.append(t)


print "num:", num

运行结果：

线程t-0打印0

线程t-1打印1

线程t-2打印2

num: 2

2.2 线程通信

从上文中的例子中看到，我们使用多线程时用的是threading模块。
事实上，python提供了几个用于多线程编程的模块，包括thread，threading和queue。thread提供了基本的线程和锁的支持，但由于功能缺陷，基本不会使用。threading提供了更高级别，功能更强的线程管理的功能。

Threading模块不仅提供了Thread类，还提供了各种非常好用的同步机制。如下所示：

Lock：锁原语对象。
RLock：可重入锁对象。
Condition：条件变量能让一个线程停止执行，直到满足某个条件。
Event：通用的条件变量。多个线程可以等待某个事件的发生，在事件发生后，所有的线程会被激活。
Semaphore：为等待锁的线程，提供一个类似“等候室”的结构。
BoundedSemaphone：与Semaphore类似，只是它不允许超过初始值。

Lock与RLock
Lock在上文的例子中使用过。它是一个同步原语，状态是锁定或未锁定。通过acquire和release方法来加锁和释放锁。
RLock是一个类似于Lock对象的同步原语，只不过它是可重入的，同一个线程可以多次调用。

Event
事件是一个简单的线程同步对象，全局定义一个flag标记，当标记值为false时，event.wait()就会阻塞，否则不会阻塞。主要提供以下几个方法：

clear(): 将标记设置为false。
set()：将标记设置为true。
is_set()：判断是否设置了标记。
wait()：监听标记，如果没有检测到标记就一直处于阻塞状态。

demo如下:

# encoding: utf-8
import threading
import time

event = threading.Event()


def lighter():
    count = 0
    event.set()   # 设置标记
    while True:
        if 5 < count < 10:
            # 红灯，清除标记位
            event.clear()
            print "red light is on..."
        elif count > 10:
            # 绿灯，设置标记位
            event.set()
            count = 0
        else:
            print "green light is on..."

        time.sleep(1)
        count += 1


def car(name):
    while True:
        if event.is_set():
            print("%s running..." % name)
            time.sleep(3)
        else:
            print("%s sees red light, waiting..." % name)
            event.wait()
            print("%s sees green light is on, starting going..." % name)


light = threading.Thread(target=lighter,)
light.start()

car = threading.Thread(target=car, args=('MINI',))
car.start()

Condition
Condition称为条件锁，也是一个同步原语，当需要线程关注特定的状态变化或事件的发生时使用。通过acquire和release来加锁和释放锁。主要方法是:

wait([timeout])：使线程进入condition的等待池等待通知，并释放锁。使用前线程必须已获得锁定，否则将抛出异常。
notify()：从等待池挑选一个线程通知，收到通知的线程自动调用acquire方法来尝试获得锁定；其他线程仍然在等待池中。使用前线程必须已获得锁定，否则将抛出异常。
notifyAll()：通知等待池中的所有线程。使用前线程必须已获得锁定，否则将抛出异常。

示例如下：

# encoding: utf-8
import time
from threading import Condition, current_thread, Thread

"""
通过两个线程依次打印0-99
"""
con = Condition()
i = 0


def tc1():
    global i
    with con:
        while i < 100:
            print current_thread().name, i
            time.sleep(0.3)
            i += 1
            if i % 2 == 1:
                con.notify()
                con.wait()
        con.notify()


def tc2():
    global i
    with con:
        while i < 100:
            print current_thread().name, i
            time.sleep(0.3)
            i += 1
            if i % 2 == 0:
                con.notify()
                con.wait()
        con.notify()


Thread(target=tc1).start()
Thread(target=tc2).start()

从示例可以看出，Condition的这几个方法使用其实与Java里面的线程同步方法一样。

public class ThreadTest {
   private static final Object obj = new Object();
   private static int num = 0;  
   public static void main(String[] args) {
       new Thread(() -> {
           synchronized (obj) {
               while (num < 100) {
                   System.out.println(Thread.currentThread().getName() + "打印" + num);
                   num += 1;
                   if (num % 2 == 1) {
                       obj.notify();
                       try {
                           obj.wait();
                       } catch (InterruptedException e) {
                           e.printStackTrace();
                       }
                   }
               }
           }
       }).start();

       new Thread(() -> {
           synchronized (obj) {
               while (num < 100) {
                   System.out.println(Thread.currentThread().getName() + "打印" + num);
                   num += 1;
                   if (num % 2 == 0) {
                       obj.notify();
                       try {
                           obj.wait();
                       } catch (InterruptedException e) {
                           e.printStackTrace();
                       }
                   }
               }
           }
       }).start();
   } }

2.3 线程池

在python2.x中，threading模块并没有系统线程池。但在multiprocessing.dummy模块中，可以通过from multiprocessing.dummy import Pool这样的方式引入线程池。使用方法与multiprocessing模块中的进程池基本相同。示例如下：

# encoding: utf8
from multiprocessing.dummy import Pool as ThreadPool


def run(n):
    return n**2


if __name__ == '__main__':
    pool = ThreadPool(5)
    futures = []
    for i in range(10):
        future = pool.apply_async(run, (i,))
        futures.append(future)

    for future in futures:
        future.wait()

    print [future.get() for future in futures]

在python3.x版本中，concurrent.futures模块实现了系统线程池。示例如下：

# encoding: utf8
from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor(3)
l = []


def run(a, b):
    return a * b


for i in range(10):
    future = pool.submit(run, i, i + 1)
    l.append(future)


print([future.result() for future in l])

3 多进程

3.1 demo示例

看一个例子如下：

import os, time
from multiprocessing import Pool


# 返回n的平方
def work(n):
    print('%s run' % os.getpid())
    time.sleep(3)
    return n**2


if __name__ == '__main__':
    p = Pool(processes=10)
    res, data = [], []
    for i in range(20):
        result = p.apply_async(work, args=(i,))
        res.append(result)
    p.close()
    p.join()  # 主进程等待所有子进程全部结束 
    for r in res:
        data.append(r.get())
    print data

执行结果如下：
在这里插入图片描述
20次work调用，一次有10个进程在同时执行，因此总共只需要执行两轮，耗费6秒多一点时间。如果使用20个进程，则只需要3秒多一点时间。

3.2 进程池

进程池Pool中常用方法：

apply(): 同步(串行)执行
apply_async()：异步(并行)执行
terminate()：立刻关闭进程池
join()：主进程等待所有子进程执行完毕。必须在关闭进程池(close或terminate)之后。
close()：等待所有进程结束后，才关闭进程池

在python3.x中，进程池与java中的线程池非常类似。如下所示：

# encoding: utf8
import os
import time
from concurrent.futures import ProcessPoolExecutor


def task(n):
    print("%s is running" % os.getpid())
    time.sleep(2)
    return n**2


if __name__ == '__main__':
    p = ProcessPoolExecutor()  # 默认为cpu个数
    l = []
    start = time.time()
    for i in range(10):
        # submit方法返回的是一个future实例
        future = p.submit(task, i)
        l.append(future)
    # 类似Pool线程池的close和join一起使用的效果
    p.shutdown()
    print('='*30)
    print([future.result() for future in l])

4 参考资料

[1]https://www.cnblogs.com/whatisfantasy/p/6440585.html
[2]https://blog.csdn.net/lzy98/article/details/88819425

还是转转

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python中的多线程与多进程

进程是操作系统资源分配的基本单位，而线程是任务调度和执行的基本单位。在Java中比较关注多线程，而在Python中，与多线程相比，可能更关注多进程。GIL在非python环境中，单核情况下，同时只能有一个任务执行。多核时可以支持多个线程同时执行。但是在python中，无论有多少核，同时只能执行一个线程。究其原因，这就是由于GIL的存在导致的。GIL的全称是Global Interpret...
复制链接

扫一扫