10.多线程与并行

最新推荐文章于 2024-07-09 19:23:05 发布

鹏哥哥啊Aaaa

最新推荐文章于 2024-07-09 19:23:05 发布

阅读量428

点赞数

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/qq_40594696/article/details/108867775

版权

Python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

1.进程、线程概念

进程：正在运行的程序

线程：进程中负责程序运行的执行单元

即：进程的范围>线程的范围。

且：一个进程可以有多个线程

全局解释器锁：同于同步线程的工具，保证任何时刻仅有一个线程在执行

CPython：最流行的python解释器

2.线程模块

python线程主要是_thread模块和threading模块。

2.1 _thread模块

_thread模块作为低级别的模块存在，一般不建议直接使用，但是_thread使用确实很简单。

_thread模块的核心是start_new_thread方法。

import time
import datetime
import _thread

date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

def thread_function(thread_id):
    print("Thread %d/t start at %s" % (thread_id,get_time_str()))
    print("Thread %d/t sleeping" % thread_id)
    time.sleep(4)
    print("Thread %d/t finish at %s" % (thread_id,get_time_str()))

def main():
    print("Main thread start at %s" % get_time_str())
    for i in range(5):
        _thread.start_new_thread(thread_function,(i,))
        time.sleep(1)
    
    time.sleep(6)
    print("Main thread finish at %s" % get_time_str())

if __name__=="__main__":
    main()

上述是最基本的使用，，实际使用时，我们通常使用线程锁，主线程可以在其他线程执行完成之后退出。

_thread.allocate_lock方法返回一个Lock对象。Lock对象有三个常见的方法：acquire,relaese,locked。

acquire：无条件获取锁定Lock对象，如果有必要，等待它被另一个线程释放（一次只有一个线程可以锁定）

release：释放锁，释放之前必须先锁定，可以不再同一个线程中释放锁。

locked：返回锁的状态，如果已被某个线程锁定，返回True

import time
import datetime
import _thread

date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

def thread_function(thread_id,lock):
    print("Thread %d/t start at %s" % (thread_id,get_time_str()))
    print("Thread %d/t sleeping" % thread_id)
    time.sleep(4)
    print("Thread %d/t finish at %s" % (thread_id,get_time_str()))
    lock.release()    #释放锁

def main():
    print("Main thread start at %s" % get_time_str())
    locks = []
    for i in range(5):
        lock = _thread.allocate_lock()
        lock.acquire()    #获取锁
        locks.append(lock)

    for i in range(5):
        _thread.start_new_thread(thread_function,(i,locks[i]))
        time.sleep(1)
    
    for i in range(5):
        while locks[i].locked():    #判断是否还有锁
            time.sleep(1)

    print("Main thread finish at %s" % get_time_str())

if __name__=="__main__":
    main()

2.2 Threading.Thread

threading模块不仅提供了面向对象的线程实现方式，还提供了各种有用的对象和方法方便我们创建和控制线程。

import time
import datetime
import threading

date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

def thread_function(thread_id):
    print("Thread %d/t start at %s" % (thread_id,get_time_str()))
    print("Thread %d/t sleeping" % thread_id)
    time.sleep(4)
    print("Thread %d/t finish at %s" % (thread_id,get_time_str()))

def main():
    print("Main thread start at %s" % get_time_str())
    threads = []

    #创建线程
    for i in range(5):
        thread = threading.Thread(target=thread_function,args=(i,))
        threads.append(thread)

    #启动线程
    for i in range(5):
        threads[i].start()
        time.sleep(1)
    
    #等待线程执行完毕
    for i in range(5):
        threads[i].join()    #等待线程结束

    print("Main thread finish at %s" % get_time_str())

if __name__=="__main__":
    main()

还有一种常见的方法就是从threading.Thread派生一个子类，在这个子类中调用父类的构造函数并实现run方法即可。

import time
import datetime
import threading

date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

class MyThread(threading.Thread):
    def __init__(self,thread_id):
        super(MyThread,self).__init__()
        self.thread_id = thread_id

    def run(self):
        print("Thread %d/t start at %s" % (self.thread_id,get_time_str()))
        print("Thread %d/t sleeping" % self.thread_id)
        time.sleep(4)
        print("Thread %d/t finish at %s" % (self.thread_id,get_time_str()))
    

def main():
    print("Main thread start at %s" % get_time_str())
    threads = []

    #创建线程
    for i in range(5):
        thread = MyThread(i)
        threads.append(thread)

    #启动线程
    for i in range(5):
        threads[i].start()
        time.sleep(1)
    
    #等待线程执行完毕
    for i in range(5):
        threads[i].join()    #等待线程结束

    print("Main thread finish at %s" % get_time_str())

if __name__=="__main__":
    main()

2.3 线程同步

如果多个线程共同修改或者操作一个数据，会有问题：

import time
import datetime
import threading

date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

class MyThread(threading.Thread):
    def __init__(self,thread_id):
        super(MyThread,self).__init__()
        self.thread_id = thread_id

    def run(self):
        for i in range(10):
            print("Thread %d\t printing!times:%d" %(self.thread_id,i))

        time.sleep(1)

        for i in range(10):
            print("Thread %d\t printing!times:%d" %(self.thread_id,i))
    

def main():
    print("Main thread start " )
    threads = []

    #创建线程
    for i in range(5):
        thread = MyThread(i)
        threads.append(thread)

    #启动线程
    for i in range(5):
        threads[i].start()
    
    #等待线程执行完毕
    for i in range(5):
        threads[i].join()    #等待线程结束

    print("Main thread finish")

if __name__=="__main__":
    main()

这样写代码，结果就是每次输出结果都不相同，而且不同的线程输出的内容有很多重叠的部分。因此线程同步很重要。

Lock对象可以实现简单的线程同步。

import time
import datetime
import threading

thread_lock = None
date_time_format = "%H:%M:%S"

def get_time_str():
    now = datetime.datetime.now()
    return datetime.datetime.strftime(now,date_time_format)

class MyThread(threading.Thread):
    def __init__(self,thread_id):
        super(MyThread,self).__init__()
        self.thread_id = thread_id

    def run(self):
        #锁定
        thread_lock.acquire()

        for i in range(3):
            print("Thread %d\t printing!times:%d" %(self.thread_id,i))

        #释放
        thread_lock.release()

        time.sleep(1)

        #锁定
        thread_lock.acquire()
        for i in range(3):
            print("Thread %d\t printing!times:%d" %(self.thread_id,i))

        #释放
        thread_lock.release()

def main():
    print("Main thread start " )
    threads = []

    #创建线程
    for i in range(5):
        thread = MyThread(i)
        threads.append(thread)

    #启动线程
    for i in range(5):
        threads[i].start()
    
    #等待线程执行完毕
    for i in range(5):
        threads[i].join()    #等待线程结束

    print("Main thread finish")

if __name__=="__main__":
    #获取锁
    thread_lock = threading.Lock()

    main()

2.4 队列

在线程之间传递，共享数据是常有的事，我们可以使用共享变量来实现相应的功能。

Queue模块帮助我们自动控制锁，保证数据同步。提供一种适用于多线程编程的先进先出法实现，可用于生产者、消费者之间线程安全的传递数据。

Queue类使用put()将元素增加到序列尾端，使用get()从队列尾部移除元素。

from queue import Queue

q = Queue()

for i in range(5):
    q.put(i)

while not q.empty():
    print(q.get())

多线程中使用Queue

import time
import threading
import queue

#创建工作队列并且限制队列的最大元素是10个
work_queue = queue.Queue(maxsize=10)

#创建结果队列并且限制队列的最大元素是10个
result_queue = queue.Queue(maxsize=10)

class WorkerThread(threading.Thread):
    def __init__(self,thread_id):
        super(WorkerThread,self).__init__()
        self.thread_id = thread_id

    def run(self):
        while not work_queue.empty():
            #从工作队列获取数据
            work = work_queue.get()
            #模拟工作耗时
            time.sleep(3)
            out = "Thread %d\t received %s" %(self.thread_id,work)
            #把结果放到结果队列
            result_queue.put(out)

def main():
    #工作队列放入数据            
    for i in range(10):
        work_queue.put("message id %d" %i)

    #开启两个工作线程
    for i in range(2):
        thread = WorkerThread(i)
        thread.start()

    #输出十个结果
    for i in range(10):
        result = result_queue.get()
        print(result)

if __name__ == "__main__":
    main()

2.5 python进程模块

2.5.1 os模块

调用system函数是最简单的创建进程的办法，函数只有一个参数。返回成功的话返回值为0。

import os 

#判断是否windows
if os.name == "nt":
    return_code = os.system("dir")
else:
    return_code = os.system("ls")

#判断命令返回值是否为0,0表示成功
if return_code == 0:
    print("success")
else:
    print("fail")

os.fork函数调用系统API并创建子进程，但是fork函数在windows上不存在，在linux和mac可以成功使用

2.5.2 subprocess模块

subprocess模块启动外部命令

subprocess模块的参数，args是必传的，其他的是可选参数：

args：字符串或者序列类型

bufsieze：指定缓冲

stdin,stdout,stderr：程序的标准输入、输出、错误句柄

preexec_fn：只在Unix平台有效，用于制定一个可执行对象，它将在子进程运行之前被调用

close_fds：如果为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道

shell：默认False，声明是否使用shell来执行程序

cwd：设置子进程的当前目录

env：指定子进程的环境变量

universal_newlines：不同系统的换行符不同

subprocess.call函数和os.system函数有点类似，接收参数运行命令并返回命令的退出码。

import os 
import subprocess

#判断是否是windows
if os.name == "nt":
    return_code = subprocess.call(["cmd","/C","dir"])
else:
    return_code = subprocess.call(["ls","-l"])

#判断命令的返回值是否为0,0表示成功
if return_code == 0:
    print("success")
else:
    print("fail")

subprocess.check_call函数如果执行的外部程序返回码不是0，会抛出CalledProcessError异常（再封装了一层call函数）

subprocess.Popen：stdout、stdin、stderr分别指定调用的外部命令的输出输入和错误处理器

import os 
import subprocess

#判断是否是windows
if os.name == "nt":
    ping = subprocess.Popen("ping -n 5 www.baidu.com",shell=True,stdout=subprocess.PIPE)
else:
    ping = subprocess.Popen("ping -c 5 www.baidu.com",shell=True,stdout=subprocess.PIPE)

#等待命令执行完毕
ping.wait()

#打印外部命令的返回码
print(ping.returncode)

#打印外部命令的输出内容
output = ping.stdout.read()
print(output)

2.5.3 multiprocessing.Process

multiprocessing模块提供了和threading类似的API来实现多线程。multiprocessing模块创建的是子进程而不是子线程，所以可以避免全局解释器锁。

multiprocessing.Process对象和threading.Thread使用方法大致一样：

from multiprocessing import Process
import os

def info(title):
    print(title)
    print('module name:',__name__)
    print('parent process',os.getppid())
    print('process id',os.getpid())

def f(name):
    info('function f')
    print('hello',name)

if __name__=="__main__":
    info('main line')
    p = Process(target=f,args=('python',))
    p.start()
    p.join()

在multiprocessing模块中有一个Queue对象，使用方法和多线程的Queue一样，区别是多线程的Queue对象是线程安全的，无法在进程间通信，multiprocessing.Queue可以

from multiprocessing import Process,Queue
import os

#创建队列
result_queue = Queue()

class MyProcess(Process):
    def __init__(self,q):
        super(MyProcess,self).__init__()
        #获取队列
        self.q = q

    def run(self):
        output = 'module name %s \n' % __name__
        output += 'parent process:%d \n' % os.getppid()
        output += 'process id:%d \n' % os.getpid()
        self.q.put(output)

def main():
    process = []

    #创建进程并把队列传递给进程
    for i in range(5):
        process.append(MyProcess(result_queue))


    #启动进程
    for i in range(5):
        process[i].start()

    #等待进程结束
    for i in range(5):
        process[i].join()

    while not result_queue.empty():
        output = result_queue.get()
        print(output)

if __name__ == "__main__":
    main()

注意，线程之间可以共享变量，但是进程之间不会共享变量。所以在多线程使用Queue对象的时候，虽然multiprocessing.Queue和queue.Queue方法一样，但是在创建进程的时候需要把Queue对象传递给进程，这样才能让主进程获取子进程数据

3.进程池

multiprocessing中的Process可以动态生成多个进程，但是当进程过多，手动很麻烦，进程池很有必要。

Pool方法创建进程池

import multiprocessing
import time

def process_func(process_id):
    print("process id %d start" % process_id)
    time.sleep(3)
    print("process id %d end" % process_id)

def main():
    pool = multiprocessing.Pool(processes=3)
    for i in range(10):
        #向进程池中添加要执行的任务
        pool.apply_async(process_func,args=(i,))

    #先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.close()
    #join 函数等待所有子进程结束
    pool.join()

if __name__=="__main__":
    main()

如果每次调用的是同一个函数，还可以使用Pool的map函数：

import multiprocessing
import time

def process_func(process_id):
    print("process id %d start" % process_id)
    time.sleep(3)
    print("process id %d end" % process_id)

def main():
    pool = multiprocessing.Pool(processes=3)
    pool.map(process_func,range(10))

    #先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.close()
    #join 函数等待所有子进程结束
    pool.join()

if __name__=="__main__":
    main()

4.线程池

multiprocessing模块中有一个multiprocessing.dummy模块，复制了multiprocessing模块的API，不过提供的不再是适用于多线程的方法，而是应用在多线程上的方法。

import multiprocessing.dummy
import time

def process_func(process_id):
    print("process id %d start" % process_id)
    time.sleep(3)
    print("process id %d end" % process_id)

def main():
    #虽然参数叫process，但是创建的是线程
    pool = multiprocessing.dummy.Pool(processes=3)
    for i in range(10):
        #向进程池中添加要执行的任务
        pool.apply_async(process_func,args=(i,))

    #先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.close()
    #join 函数等待所有子进程结束
    pool.join()

if __name__=="__main__":
    main()

import multiprocessing.dummy
import time

def process_func(process_id):
    print("process id %d start" % process_id)
    time.sleep(3)
    print("process id %d end" % process_id)

def main():
    #虽然参数叫process，但是创建的是线程
    pool = multiprocessing.dummy.Pool(processes=3)
    pool.map(process_func,range(10))

    #先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.close()
    #join 函数等待所有子进程结束
    pool.join()

if __name__=="__main__":
    main()