【Python学习笔记】（八）多线程与并行：_thread模块、threading模块、Queue模块；os模块、subprocess模块、multiprocessing.Process模块_在python多线程和并行计算相关的主要模块中,( )模块提供基于进程的并行功能。-CSDN博客

本文链接：https://blog.csdn.net/weixin_43931465/article/details/106932333

进程：计算机中已运行程序的实体。
线程：操作系统能都进行运算调度的最小单位。进程包含线程，线程是进程的实际运作单位。
多线程：指在软件或者硬件上实现多个线程并发执行的技术。

全局解释锁（GIL）：是计算机程序设计语言解释器用于同步线程的工具，保证任何时刻仅有一个线程在执行。
主要是CPython，并不是所有Python解释器都有全局解释锁。

Python线程模块

_thread模块

标准库中的_thread作为低级别的模块存在，一般不建议直接使用。（模块名字以“_”开头）

import time
import datetime
import _thread

dt_format="%H:%M:%S"#自定义字符串变量作为格式输出，“时分秒”

def get_time():#获取并按格式返回当前时间
    dt=datetime.datetime.now()
    return datetime.datetime.strftime(dt,dt_format)

def thread_fun(thread_id):#线程时间输出
    print("Thread %d\t start at %s"%(thread_id,get_time()))
    print("Thread %d\t sleeping"%thread_id)
    time.sleep(4)
    print("Thread %d\t finish at %s"%(thread_id,get_time()))

def main():#定义主函数
    print("Main thread start at %s"%get_time())
    for i in range(5):
        _thread.start_new_thread(thread_fun,(i,))#核心！
        time.sleep(1)
    time.sleep(6)#防止主线程比其他线程提前结束
    print("Main thread finish at %s"%get_time())

if __name__=="__main__":#当前模块直接被执行
    main()

start_new_thread方法：启动一个线程并返回其标识。该方法提供简单的多线程机智，在单个线程执行时，别的线程也在“同步”执行。
_thread.start_new_thread(被执行函数名,线程使用的元组参数列表[,可选的参数指定关键字参数的字典 ])
start_new_thread方法有两个参数，第一个参数是我们预定义的函数(这里是thread_fun)，也就是我们想要创建的线程体；第二个参数是一个tuple（元组），罗列线程体的函数的所有参数。python很具灵活性，因为不管线程体有多少个参数，通过一个tuple我们就可以传递足够的参数，这里我传递一个(i,)的tuple，表示只有一个参数。
当函数返回时，线程讲以静默方式退出。当函数以未处理的异常终止是，将打印堆栈跟踪，然后线程退出，其它线程继续运行。
主线程一旦运行结束，其它线程无论是否执行完都会被强制退出。

使用锁可以有效避免主线程过早或者过晚地退出而产生不可预期的结果。

import time
import datetime
import _thread

dt_format="%H:%M:%S"#自定义字符串变量作为格式输出，“时分秒”

def get_time():#获取并按格式返回当前时间
    dt=datetime.datetime.now()
    return datetime.datetime.strftime(dt,dt_format)

def thread_fun(thread_id,lock):#线程时间输出
    print("Thread %d\t start at %s"%(thread_id,get_time()))
    print("Thread %d\t sleeping"%thread_id)
    time.sleep(4)
    print("Thread %d\t finish at %s"%(thread_id,get_time()))
    lock.release()

def main():#定义主函数
    print("Main thread start at %s"%get_time())
    locks=[]
    for i in range(5):#0~4
        lock=_thread.allocate_lock()#给任务加上锁
        lock.acquire()#获取锁对象
        locks.append(lock)#把具体的锁对象加到锁列表里面去
    for i in range(5):
        _thread.start_new_thread(thread_fun,(i,locks[i]))
        time.sleep(1)
    for i in range(5):
        while locks[i].locked():#判断派生的线程有没有锁，如果有暂停主线程，直到所有的锁都释放了才会执行主线程
            time.sleep(1)
    print("Main thread finish at %s"%get_time())

if __name__=="__main__":#当前模块直接被执行
    main()

_thread.allocate_lock方法：返回一个Lock对象。我们可以同时开启多个线程，但是在任意时刻只能有一个线程能在解释器运行，因此需要由全局解锁器（GIL）控制运行哪个线程

Lock对象有三个常见方法：
acquire方法：无条件地获取锁定Lock对象，如果有必要，等待它被另一个线程释放。一次只有一个线程可以获取锁定，当某个线程被锁上之后，会优先执行这个锁的内容，直到锁释放才会执行其他线程。
release方法：用于释放锁。释放之前必须先锁定，可以不在同一个线程中释放锁。
lock方法：用于返回锁的状态。锁定True/否则为False。

PYTHON多线程
 多线程在python中的使用 thread

Threading模块

threading模块不仅提供了面向对象的线程实现方式，还提供了各种有用的对象和方法创建和控制线程。大部分操作围绕threading.Thread类实现。

import time
import datetime
import threading

dt_format="%H:%M:%S"#自定义字符串变量作为格式输出，“时分秒”

def get_time():#获取并按格式返回当前时间
    dt=datetime.datetime.now()
    return datetime.datetime.strftime(dt,dt_format)

def thread_fun(thread_id):#线程时间输出
    print("Thread %d\t start at %s"%(thread_id,get_time()))
    print("Thread %d\t sleeping"%thread_id)
    time.sleep(4)
    print("Thread %d\t finish at %s"%(thread_id,get_time()))

def main():#定义主函数
    print("Main thread start at %s"%get_time())
    threads=[]

    for i in range(5):#创建线程
        thread=threading.Thread(target=thread_fun,args=(i,))#实例化threading.Thread对象
        threads.append(thread)

    for i in range(5):#启动线程
        threads[i].start()
        time.sleep(1)

    for i in range(5):#等待线程执行完毕
        threads[i].join()

    print("Main thread finish at %s"%get_time())

if __name__=="__main__":#当前模块直接被执行
    main()

mthread=threading.Thread(target=函数名,args=(参数表))
threading.Thread方法并不会立即执行线程，只会创建一个实例。需要调用对象的方法进行操作。
start方法：启动线程。
join方法：等待线程的结束。
使用threading.Thread对象可以自动地帮助我们管理线程锁（创建锁、分配锁、获得锁、释放锁、检查锁等步骤）。

在子类中重写run() 方法

import time
import datetime
import threading

dt_format="%H:%M:%S"#自定义字符串变量作为格式输出，“时分秒”

def get_time():#获取并按格式返回当前时间
    dt=datetime.datetime.now()
    return datetime.datetime.strftime(dt,dt_format)

class MyThread(threading.Thread):#从threading.Thread类派生一个子类
    def __init__(self,thread_id):
        super(MyThread, self).__init__()#调用父类的构造函数
        self.thread_id=thread_id

    def run(self):#实现run方法
        print("Thread %d\t start at %s"%(self.thread_id,get_time()))
        print("Thread %d\t sleeping"%self.thread_id)
        time.sleep(4)
        print("Thread %d\t finish at %s"%(self.thread_id,get_time()))

def main():#定义主函数
    print("Main thread start at %s"%get_time())
    threads=[]

    for i in range(5):#创建线程
        thread=MyThread(i)
        threads.append(thread)

    for i in range(5):#启动线程
        threads[i].start()
        time.sleep(1)

    for i in range(5):#等待线程执行完毕
        threads[i].join()

    print("Main thread finish at %s"%get_time())

if __name__=="__main__":#当前模块直接被执行
    main()

派生类中重写了父类threading.Thread的run()方法，在子类中只有_init_()和run()方法被重写。使用线程的时候先生成一个子线程类的对象，然后对象调用start()方法运行线程（start调用run）。

python：threading.Thread类的使用详解

线程同步

当多个线程共同修改或者操作同一个对象或者数据，不同线程的输出内容有可能叠加在一起，即多线程的不确定性。

import time
import threading

thread_lock=None

class MyThread(threading.Thread):#从threading.Thread类派生一个子类
    def __init__(self,thread_id):#调用父类的构造函数
        super(MyThread, self).__init__()
        self.thread_id=thread_id

    def run(self):#重写run方法
        thread_lock.acquire()#锁定
        for i in range(3):
            print("Thread %d\t printing!times:%d"%(self.thread_id,i))
        thread_lock.release()#释放

        time.sleep(1)

        thread_lock.acquire()#锁定
        for i in range(3):
            print("Thread %d\t printing!times:%d" % (self.thread_id, i))
        thread_lock.release()#释放

def main():#定义主函数
    print("Main thread start!")
    threads=[]

    for i in range(5):#创建线程
        thread=MyThread(i)
        threads.append(thread)

    for i in range(5):#启动线程
        threads[i].start()

    for i in range(5):#等待线程执行完毕
        threads[i].join()

    print("Main thread finish!")

if __name__=="__main__":#当前模块直接被执行
    thread_lock=threading.Lock()#获取锁
    main()

threading.Lock其实调用的就是_thread.allocate_lock获取Lock对象，可以实现简单线程同步。五个线程“同时”在执行，一个线程执行完以后才会执行下一个线程。
~~运行结果中线程执行顺序为啥不一致，有待考究。~~
在这里插入图片描述

Queue模块（队列）

Queue类实现了一个基本的先进先出（FIFO,First In First Out）容器。
put方法将元素添加到序列尾端
get方法从队列首部移除元素。

from queue import Queue

q=Queue()

for i in range(5):
    q.put(i)

while not q.empty():
    print(q.get())

在多线程中使用Queue模块，可以帮助我们实现在线程间传递、共享数据。多个线程可以共同使用同一个Queue实例。

import time
import threading
import queue

work_queue=queue.Queue(maxsize=10)#创建工作队列，限制队列最大元素10
result_queue=queue.Queue(maxsize=10)#创建结果队列

class WorkerThread(threading.Thread):
    def __init__(self,thread_id):
        super(WorkerThread,self).__init__()
        self.thread_id=thread_id
    def run(self):
        while not work_queue.empty():
            work=work_queue.get()#从工作队列获取数据
            time.sleep(3)#模拟工作耗时
            out="Thread %d\t received %s"%(self.thread_id,work)
            result_queue.put(out)#把结果放进结果队列

def main():
    for i in range(10):
        work_queue.put("message id %d"%i)#将数据放入工作队列
    for i in range(2):#开启两个工作线程
        thread=WorkerThread(i)
        thread.start()
    for i in range(10):#输出结果
        result=result_queue.get()
        print(result)

if  __name__=="__main__":
    main()

Python进程模块

多线程没有能力利用多核计算，运用在IO密集型场景中使用，例如文件访问网络AOI等。
多进程可以充分利用所有的CPU资源，是和计算密集型场景，例如岁视频进行转码等。

os模块

import os

if os.name=="nt":#判断是否为windows
    return_code=os.system("dir")#输出当前目录的文件和文件夹
else:
    return_code=os.system("ls")#（不同系统中）输出当前目录的文件和文件夹

if return_code==0:#判断命令返回值是否为0,0代表运行成功
    print("Run Success!")
else:
    print("Something wrong!")

system函数：是最简单的创建进程的方式，函数只有一个参数，就是要执行的命令。
system函数可以将字符串转化成命令在服务器上运行；其原理是每一条system函数执行时，其会创建一个子进程在系统上执行命令行，子进程的执行结果无法影响主进程。
为了保证system执行多条命令可以成功，多条命令需要在同一个子进程中运行。

每个进程都有一个不重复的“进程ID号”，或称“pid”，它对进程进程进行标识。
os.getpid()获取当前进程id
os.getppid()获取父进程id

import os

os.system('cd /usr/local && mkdir aaa.txt')
# 或者
os.system('cd /usr/local ; mkdir aaa.txt')

比os.system函数更复杂一点的是exec系列函数。
os.fork函数调用系统API（应用程序接口）并创建子进程。[Linux和Mac]当os.fork返回值是0时，代表当前程序在子进程，而在主进程中，os.fork返回的是子进程的进程ID。

python基础之os.system函数

subprocess模块

subprocess模块提供了很多和调用外部命令相关的方法。system函数也可以调用外部命令。调用外部命令也是创建进程的一种方法。
大部分模块调用外部命令的函数都使用类似参数。

参数	释义
args[必]	字符串或者序列类型。表示要执行的程序的名字或者路径。
bufsize	0表示无缓冲；1表示缓冲；其他任何整数值值表示缓冲区大小；负数值表示使用系统默认缓冲，通常表示完全缓冲。默认为0。
executable	一般不用吧，args字符串或列表第一项表示程序名
stdin；stdout；stderr	分别表示程序的标准输入、输出、错误句柄。默认为None 没有任何重定向，继承父进程；PIPE 创建管道；文件对象；文件描述符(整数)；stderr 还可以设置为 STDOUT
preexec_fn	用于指定一个可执行对象，将在子进程运行之前被调用(unix)
close_fds	True/False.unix 下执行新进程前是否关闭0/1/2之外的文件；windows下不继承还是继承父进程的文件描述符(True为不继承，重新定义子进程的标准输入、输出、错误。
shell	默认为False，声明是否使用shell来执行程序。为True，把args看作一个字符串。unix下相当于args前面添加了 “/bin/sh“ ”-c”；window下，相当于添加"cmd.exe /c”。
cwd	设置子进程当前的工作目录。
env	指定子进程的环境变量。如果为None。将从父进程中继承。
universal_newlines	各种换行符统一处理成 ‘\n’
startupinfo	window下传递给CreateProcess的结构体
creationflags	windows下，传递CREATE_NEW_CONSOLE创建自己的控制台窗口

import os
import subprocess

if os.name=="nt":#判断是否是windows
    return_code=subprocess.call(["cmd","/C","dir"])
else:
    return_code=subprocess.call(["ls","-1"])

if return_code==0:
    print("Run success!")
else:
    print("Something wrong!")

subprocess.call函数和os.system函数类似，接收参数运行命令并返回命令退出码，退出码为0表示运行成功。

import os
import subprocess

try:
    if os.name=="nt":#判断是否是windows
        subprocess.check_call(["cmd","/C","test command"])
    else:
        subprocess.check_call(["ls","test command"])
except subprocess.CalledProcessError as e:
    print("Something wrong!",e)

subprocess.check_call方法和subprocess.call方法基本相同，只是如果执行的外部程序返回码不是0，就会抛出CalledProcessError异常。（多封装了一层call函数）
在这里插入图片描述

import os
import subprocess

if os.name=="nt":
    ping=subprocess.Popen("ping -n 5 www.baidu.com",shell=True,stdout=subprocess.PIPE)
else:
    ping=subprocess.Popen("ping -c 5 www.baidu.com",shell=True,stdout=subprocess.PIPE)

ping.wait()#等待命令执行完毕
print(ping.returncode)#打印外部命令的返回码
output=ping.stdout.read()#打印外部命令的输出内容
print(output)

subprocess.Popen对象提供了功能丰富的方式来调用外部命令。
subprocess.check_cal*和subprocess.call都是调用的Popen对象，再进行封装。

python之subprocess模块

multiprocessing.Process模块

multiprocessing模块创建的是子进程而不是子线程，所以可以有效地避免全局解释锁和有效地利用多核CPU的性能。

import os
from multiprocessing import Process

def info(title):
    print(title)
    print("module name:",__name__)
    print("parent process:",os.getppid())#获取父进程id  
    print("process id:",os.getppid())#获取当前进程id  

def f(name):
    info("function f")
    print("hello",name)

if __name__=="__main__":
    info("main line")
    p=Process(target=f,args=("WYF",))
    p.start()
    p.join()

multiprocessing.Process对象和threading.Thread的使用方法一致，target参数指定要执行的函数名，args参数传递元组来作为函数的参数传递。
同样，也可以有一个子类从multiprocessing.Process派生并实现run方法。

在multiprocessing模块中有一个Queue对象，使用方法和多线程中的Queue对象一样，区别是多线程Queue对象是线程安全的，无法在进程之间通信。而multiprocess.Queue可以在进程间通信。

import os
from multiprocessing import Process,Queue

redult_queue=Queue()

class MyProcess(Process):
    def __init__(self,q):
        super(MyProcess,self).__init__()
        self.q=q#获取队列
    def run(self):
        output="model name %s\n"%__name__
        output+="parent process:%d\n"%os.getppid()
        output+="process id:%d"%os.getpid()
        self.q.put(output)

def main():
    processes=[]
    for i in range(5):#创建进程并把队列传递给进程
        processes.append(MyProcess(redult_queue))
    for i in range(5):#启动进程
        processes[i].start()
    for i in range(5):#等待进程结束
        processes[i].join()
    while not redult_queue.empty():
        output=redult_queue.get()
        print(output)

if __name__=="__main__":
    main()

线程之间可以共享变量，但是进程之间不会共享变量。所以在多进程使用Queue对象的时候，虽然multiprocessing.Queue的方法能够和queue.Queue方法一样，但是在创建进程的时候需要吧Queue对象传递给进程，这样才能正确地让主进程获取子进程的数据，否则主进程的Queue内一直都是空的。

进程池（Pool）

Pool可以提供指定数量的进程供客户调用，当有新的请求提交到Pool中时，如果池还没有满，就可以创建一个新的京城来执行该请求；但如果池中的进程数已经达到规定最大值，那么该请求就会等待，直到池中有进程结束，才会创建新的进程来执行它。

import multiprocessing
import time

def process_fun(process_id):
    print("process id %d start"%process_id)
    time.sleep(3)
    print("process id %d end"%process_id)

def main():
    pool=multiprocessing.Pool(processes=3)
    for i in range(10):#向进程池添加要执行的任务
        pool.apply_async(process_fun,args=(i,))
    pool.close()#先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.join()#等待所有子进程结束

if __name__=="__main__":
    main()

Pool通过pool.apply_async(func,args=(args))方法创建子进程。
如果每次地哦啊通的用一个函数，还可以使用pool.map(要执行的函数,可迭代对象)

for i in range(10):#向进程池添加要执行的任务
        pool.apply_async(process_fun,args=(i,))
或
pool.map(process_fun,range(10))

线程池

multiprocessing模块中有个multiprocessing.dummy模块，复制了它的API，应用到多线程上。但多线程实现线程池的方法和多进程实现进程池的方法一模一样。同样可以使用map函数。

import multiprocessing.dummy
import time

def process_fun(process_id):
    print("process id %d start"%process_id)
    time.sleep(3)
    print("process id %d end"%process_id)

def main():
    pool=multiprocessing.dummy.Pool(processes=3)#参数虽然也叫processes但是实际创建的是线程
    pool.map(process_fun,range(10))
    pool.close()#先调用close关闭进程池，不能再有新任务被加入到进程池中
    pool.join()#等待所有子进程结束

if __name__=="__main__":
    main()