进程和线程合集以及实例

最新推荐文章于 2022-04-17 17:40:25 发布

落子无悔!

最新推荐文章于 2022-04-17 17:40:25 发布

阅读量3.9k

点赞数 5

分类专栏： # Python 文章标签： python 多进程多线程

本文链接：https://blog.csdn.net/qq_32460819/article/details/109024510

版权

Python 专栏收录该内容

30 篇文章 2 订阅

订阅专栏

文章目录

进程和线程
- 0.1 粗略介绍:
- 0.2 线程进程区别:
一.进程
二. 线程
三. 应用
- 3.1 进程应用之视频压缩处理

进程和线程

重点: 多进程模式最大的优点就是稳定性高，这是因为一个子进程崩溃了，不会影响主进程和其他子进程, 但是创建进程开销以及根据系统的cpu核相关, 如果有几千个进程同时运行，那么操作系统连调度都会成问题; 线程比进程快一点, 最大的问题是一个线程出问题, 整个进程就会crash,因为一个进程的多个线程共享内存;

0.1 粗略介绍:

举例:
运行QQ, 需要有的进程

(1)等待对方消息
(2)等待用户输入
(3)验证身份
(4)更新好友状态

计算机上的并行现象:
对于单核CPU运行QQ, 同一时刻只能运行一个指令在该CPU上, 所以在某一时刻, CPU上执行的命令只能是1,2,3,4其中任意个, 当每一个命令执行时间很小,比如微秒,这样我们用户是看不到他在切换进程,所以也就实现了伪现象(我们觉得这四个进程同时在执行)
对于四核CPU运行QQ, 同一时刻能运行四个指令在该机器,所以在多核CPU上能实现并行处理

线程:
进程中的一个执行任务（控制单元），负责当前进程中程序的执行。一个进程至少有一个线程，一个进程可以运行多个线程，多个线程可共享数据。与进程不同的是同类的多个线程共享进程的堆和方法区资源，但每个线程有自己的程序计数器、虚拟机栈和本地方法栈，所以系统在产生一个线程，或是在各个线程之间作切换工作时，负担要比进程小得多，也正因为如此，线程也被称为轻量级进程。

0.2 线程进程区别:

线程具有许多传统进程所具有的特征，故又称为轻型进程(Light—Weight Process)或进程元；而把传统的进程称为重型进程(Heavy—Weight Process)，它相当于只有一个线程的任务。在引入了线程的操作系统中，通常一个进程都有若干个线程，至少包含一个线程

根本区别：
进程是操作系统资源分配的基本单位，而线程是处理器任务调度和执行的基本单位
资源开销：
每个进程都有独立的代码和数据空间（程序上下文），程序之间的切换会有较大的开销；线程可以看做轻量级的进程，同一类线程共享代码和数据空间，每个线程都有自己独立的运行栈和程序计数器（PC），线程之间切换的开销小。
包含关系：
如果一个进程内有多个线程，则执行过程不是一条线的，而是多条线（线程）共同完成的；线程是进程的一部分，所以线程也被称为轻权进程或者轻量级进程。
内存分配：
同一进程的线程共享本进程的地址空间和资源，而进程之间的地址空间和资源是相互独立的
影响关系：
一个进程崩溃后，在保护模式下不会对其他进程产生影响，但是一个线程崩溃整个进程都死掉。所以多进程要比多线程健壮。
执行过程：
每个独立的进程有程序运行的入口、顺序执行序列和程序出口。但是线程不能独立执行，必须依存在应用程序中，由应用程序提供多个线程执行控制，两者均可并发执行

一.进程

1.1 fork创建单进程

fork只能用于Linux和Unix上创建进程, 而且一次只能创建一个进程, 该进程只能执行一次

# Only works on Unix/Linux/Mac:
import os
# 多进程, 父进程创建紫禁城
print('Process (%s) start...' % os.getpid())
pid = os.fork() # 返回两次,当前进程(父进程)复制一份(紫禁城), 然后在父进程和紫禁城中分别返回.
# 紫禁城返回用于是0, 在父进程中返回紫禁城的ID,
# 紫禁城调用getppid()得到父进程ID
if pid == 0:
    print(f'I am child process ({os.getpid()}) and my parent is {os.getppid()}.')
else:
    print(f'I ({os.getpid()}) just created a child process ({pid})')

'''
Process (55) start...
I (55) just created a child process (56)
I am child process (56) and my parent is 55.
'''

1.2 multiprocessing:Process 创建子进程

multiprocessing模块就是跨平台版本的多进程模块
Process类来代表一个进程对象

from multiprocessing import Process
import os
import time

# 子进程要执行的代码
def run_proc(name):
    print(f'Run child process {name} ({os.getpid()})...')
    for i in range(10):
    	time.sleep(1)
    	print(f'Run child process {name} ({os.getpid()})...')

def run2():
    for _ in range(10):
        print("我是你爸爸", _)

if __name__=='__main__':
    print(f'Parent process {os.getpid()}.') # 打印当前父进程
    p1 = Process(target=run_proc, args=('process111',)) # 创建子进程
    print('Child process will start.')
    p1.start() # 开始紫禁城
    # p1.start() # cannot start a process twice
    # p1.join() # 等待紫禁城运行完毕再运行下面的代码, 一般用于进程间同步, 可以关闭
    p2 = Process(target=run_proc, args=('process222',)) # 创建子进程
    print('Child process will start.')
    p2.start() # 开始紫禁城
    run2()

'''
algorithm/LeetCode101/test.py"
Parent process 11385.
Child process will start.
Child process will start.
Run child process process111 (11386)...
我是你爸爸 0
我是你爸爸 1
我是你爸爸 2
我是你爸爸 3
我是你爸爸 4
我是你爸爸 5
我是你爸爸 6
我是你爸爸 7
我是你爸爸 8
我是你爸爸 9
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
Run child process process111 (11386)...
Run child process process222 (11388)...
'''

1.2 multiprocessing:Pool 创建进程池

要想得到函数的返回值使用p.apply_async().get()
为了解决需要创建大量子进程的问题

from multiprocessing import Pool
import os, time, random
from datetime import datetime
# 创建进程池, 执行大量子进程
def long_time_task(name):
	for i in range(5):
		print(f'Run task {name} ({os.getpid()})...{str(datetime.now())}')
		time.sleep(0.5)


if __name__ == '__main__':
    print(f'Parent process {os.getpid()}.')
    p = Pool(4) # 创建进程池
    # 创建5个进程, 但是因为进程池最大4, 
    # 因此先并行执行前四个进程, 然后在执行第五个
    for i in range(5): 
    	p.apply_async(long_time_task, args=(i,)) 
    '''
	注意apply_async是异步的，就是说子进程执行的同时，
	主进程继续向下执行。所以“Waiting for all subprocesses done...”
	先打印出来，close方法意味着不能再添加新的Process了。
	对Pool对象调用join（）方法，
	会暂停主进程，等待所有的子进程执行完，
	所以“All subprocesses done.”最后打印。
    '''
    print('Waiting for all subprocess done...')
    p.close()
    p.join()
    print('All subprocesses done.')
'''
Parent process 108.
Waiting for all subprocess done...
Run task 2 (111)...2020-10-12 10:34:48.295740
Run task 0 (109)...2020-10-12 10:34:48.295471
Run task 3 (112)...2020-10-12 10:34:48.295872
Run task 1 (110)...2020-10-12 10:34:48.295616
Run task 2 (111)...2020-10-12 10:34:48.796786
Run task 0 (109)...2020-10-12 10:34:48.796786
Run task 3 (112)...2020-10-12 10:34:48.797633
Run task 1 (110)...2020-10-12 10:34:48.797590
Run task 2 (111)...2020-10-12 10:34:49.297942
Run task 0 (109)...2020-10-12 10:34:49.299944
Run task 3 (112)...2020-10-12 10:34:49.302034
Run task 1 (110)...2020-10-12 10:34:49.302785
Run task 2 (111)...2020-10-12 10:34:49.799091
Run task 0 (109)...2020-10-12 10:34:49.801151
Run task 3 (112)...2020-10-12 10:34:49.802830
Run task 1 (110)...2020-10-12 10:34:49.803421
Run task 2 (111)...2020-10-12 10:34:50.299861
Run task 0 (109)...2020-10-12 10:34:50.301776
Run task 1 (110)...2020-10-12 10:34:50.303768
Run task 3 (112)...2020-10-12 10:34:50.303847
Run task 4 (111)...2020-10-12 10:34:50.801188
Run task 4 (111)...2020-10-12 10:34:51.302434
Run task 4 (111)...2020-10-12 10:34:51.803305
Run task 4 (111)...2020-10-12 10:34:52.303814
Run task 4 (111)...2020-10-12 10:34:52.805040
All subprocesses done.
'''

1.4 进程池统一处理文件

import sys, os
from multiprocessing import Pool
def zipFile(file) :
    cmd = f"zip -r {file}.zip {file}"
    print(cmd)
    os.system(cmd)


if __name__ == '__main__':
    zipFiles = []
    files = os.listdir()
    for file in files:
        if file.endswith("毕业文档") and file+".zip" not in files:
            zipFiles.append(file)
    with Pool(len(zipFiles)) as p:
        print(p.map(zipFile, zipFiles))

1.3 进程间通信公共队列

创建一个公共的队列, 写进程向队列里面写数据, 读进程不断循环, 等队列里面有数据直接读出来


from multiprocessing import Process, Queue
import os, time, random

#写数据进程执行的代码：
def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())

#读数据进程执行的代码：
def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

if __name__=='__main__':
    #父进程创建出Queue，并传给各个子进程
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    #启动子进程pw，写入队列
    pw.start()
    #启动子进程pr，读取队列
    pr.start()
    #等待pw进程结束
    pw.join()
    #pr进程死循环，无法等待，只能强行终止;
    pr.terminate()

'''
Process to write: 147
Put A to queue...
Process to read: 148
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.
'''

二. 线程

多线程的优点:
线程共享相同的内存空间，不同的线程可以读取内存中的同一变量(每个进程都有各自独立的空间)。线程带来的开销要比进程小。
由于线程是操作系统直接支持的执行单元，因此许多高级语言都内置了多线程的支持，Python也不例外，Python中的线程是真正的Posix Thread而不是模拟出来的线程。
要实现多线程，Python的标准库提供了两个模块：
_thread和threading,前者是低级模块，后者是高级模块，后者分装了前者。绝大多数情况下，我们只需要使用threading这个高级的模块。

2.1 创建线程 threading.Thread

import time, threading

#新线程执行的代码
def loop(num):
    print(f'thread {threading.current_thread().name} is runnging...')
    for i in range(num):
        print(f'thread {threading.current_thread().name} >>> {i}' )
        time.sleep(0.5)
    print(f'thread {threading.current_thread().name} ended.')


print(f'father thread {threading.current_thread().name} is runnging...')
t1 = threading.Thread(target=loop, name='LoopTread_1', args=[5,])
t1.start()
print(f'Thread {threading.current_thread().name} ended.')
t2 = threading.Thread(target=loop, name='LoopTread_2', args=[5,])
t2.start()
t2.join() # 等待线程2执行完毕后再执行主线程
print("all thread is over...")

'''
father thread MainThread is runnging...
thread LoopTread_1 is runnging...
thread LoopTread_1 >>> 0
Thread MainThread ended.
thread LoopTread_2 is runnging...
thread LoopTread_2 >>> 0
thread LoopTread_1 >>> 1
thread LoopTread_2 >>> 1
thread LoopTread_2 >>> 2
thread LoopTread_1 >>> 2
thread LoopTread_1 >>> 3
thread LoopTread_2 >>> 3
thread LoopTread_2 >>> 4
thread LoopTread_1 >>> 4
thread LoopTread_1 ended.
thread LoopTread_2 ended.
all thread is over...
'''

2.2 多线程之间共享变量举例加锁优劣 threading.Lock()

# 使用多线程还是有风险的，因为在多线程所有变量被所有线程共享，
# 此时可能会出现多个线程同时改变一个变量，导致出现错误。
# 为了避免这个错误的出现，我们应该加锁lock。
import time, threading
# balance 是存款


def change_it(n):#先存后取结果应该为0
    global balance #共享变量
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for n in range(100000):
        change_it(n)

lock = threading.Lock()
def run_thread_clock(n): 
    for i in range(100000):
        # 先要获取锁:
        lock.acquire()
        try:
            change_it(n)
        finally:
            # 改完了一定要释放锁:
            lock.release()

balance = 0
t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print("不带锁的情况结果",balance)

balance = 0
t1 = threading.Thread(target=run_thread_clock, args=(5,))
t2 = threading.Thread(target=run_thread_clock, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print("带锁的情况结果",balance)
# 我们启动了连个线程，先存后取，理论上结果应该为0，
# 但是线程对的调度也是由操作系统决定，所以，
# 当t1 和 t2交替执行，循环次数够多，结果就不一定是0了。
# 因为高级语言的一条语句在CPU执行时是若干条语句。
#当多个线程同时执行lock.acquire()时，
#只有一个线程能成功地获取锁，然后继续执行下面的代码，
#其他线程就只能等待直到或取到锁为止。所以获取到锁的线程在用完后一定要释放锁，
#否则等待锁开启的线程，将永远等待，所以我们用try...finally来确保锁一定会被释放。

# Tips:锁的坏处就是阻止了多线程的并发执行，
# 效率大大地下降了。当不同的线程持有不同的锁，
# 并试图获取对方的锁时，可能会造成死锁。
# 
# 多线程编程，模型复杂，容易发生冲突，必须加锁以隔离，
# 同时又要小心死锁的发生。Python解释器由于设计时有GIL全局锁。
# 导致了多线程无法利用多核，这就是模拟出来的并发（线程数量大于处理器数量）。

'''
zjq@DESKTOP-2RLT53L:进程$ python process.py
不带锁的情况结果 55094
带锁的情况结果 0
'''

2.3 线程之间局部变量举例 threading.local()

每一个线程应该可以拥有自己的局部变量，线程使用自己的局部变量比使用全局变量好因为局部变量只能自己使用，不会影响其他的线程，而使用全局变量的话则必须加锁。
创建全局threading.local()对象,threading.local().student是局部变量：
local_school 最常用的地方就是为每个线程绑定一个数据库连接，HTTP请求用户信息身份等。这样一个线程的所有调用到的处理函数都可以非常方便地访问这些资源。

import threading
import time
from datetime import datetime
# 创建全局ThreadLocal对象
local_school = threading.local() 

def process_student():
    #获取当前线程关联的student：
    std = local_school.student
    time.sleep(1)
    print(f'Hello, {std} (in {threading.current_thread().name}) {str(datetime.now())}')

def process_thread(name):
    #绑定当前线程关联的student：
    local_school.student = name
    process_student()

t1 = threading.Thread(target=process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target=process_thread, args=('Bob',),   name='Thread-B')
t1.start()
t2.start()
# t1.join()
# t2.join()

'''
Hello, Alice (in Thread-A) 2020-10-12 15:12:38.232229
Hello, Bob (in Thread-B) 2020-10-12 15:12:38.232767
'''

2.4 重复任务多线程实现/线程池

比如我们希望处理一个1w+的数据, 每个数据都是调用重复函数, 这样就可以利用线程池实现多线程加速执行数据处理任务了

安装 pip install threadpool

a. 创建单个线程池

import threadpool
import time

def handle(str):
    print("Hello ",str)
    time.sleep(2)

name_list =['A','B','C','D']

# 这里使用非线程执行函数
start_time = time.time()
for i in range(len(name_list)):
    handle(name_list[i])
print('%d second'% (time.time()-start_time))

start_time = time.time() 
pool = threadpool.ThreadPool(10)  # 这是线程最大数量为10
requests = threadpool.makeRequests(handle, name_list)  # 创建线程, 执行函数为handle, 函数的输入参数是name_list
[pool.putRequest(req) for req in requests]  # 执行线程
pool.wait()  # 等待线程执行完毕
print('具有线程的执行时间消耗是: %d second'% (time.time()-start_time))

b. 线程函数具有多个参数时的参数传递方法

import threadpool
import time
pool = threadpool.ThreadPool(10)  # 这是线程最大数量为10

def handle(str1, str2, str3):
    print("Hello ",str1, str2, str3)
    time.sleep(2)

# 方法1  
lst_vars_1 = ['1', '2', '3']
lst_vars_2 = ['4', '5', '6']
func_var = [(lst_vars_1, None), (lst_vars_2, None)]
# 方法2
dict_vars_1 = {'m':'1', 'n':'2', 'o':'3'}
dict_vars_2 = {'m':'4', 'n':'5', 'o':'6'}
func_var = [(None, dict_vars_1), (None, dict_vars_2)]    

requests = threadpool.makeRequests(handle, func_var)
[pool.putRequest(req) for req in requests]
pool.wait()

c. 线程池的具体应用例子

import os,sys
import threadpool 
pool = threadpool.ThreadPool(10) 

def handle_single(i, file):
	print("执行的命令是:", f'get_all_data({file}, f"{save_path}/{i}.csv")')

all_files = os.listdir(sys.argv[1])
func_var = [(None, {"file":file, "i":i}) for i, file in enumerate(all_files)]    

requests = threadpool.makeRequests(handle_single, func_var) 
[pool.putRequest(req) for req in requests] 
pool.wait() 
print("完成")

三. 应用

3.1 进程应用之视频压缩处理

运行命令 python cope.py 待压缩视频所在文件夹

# cope.py
import os
import sys
import time
from multiprocessing import Pool
time_now = time.time()
wait_handle_dir = str(sys.argv[1])
if "/" not in wait_handle_dir:
	wait_handle_dir = wait_handle_dir+"/"

save_dir = wait_handle_dir[:-1] + "_handle/"
print(f"结果将会保存到路径{save_dir}里面")

if os.path.exists(save_dir):
    pass
else:
    os.mkdir(save_dir)

p = Pool(4) # 最大占用4个cpu的进程池
def handle_single(file):
    save_name = f"{os.path.splitext(file)[0]}.mp4"
    if save_name not in os.listdir(save_dir): # 判断文件夹内是有转化完成的数据
        print(f"trackle {wait_handle_dir}{file} ..................................")
        os.system(f'ffmpeg -i "{wait_handle_dir}{file}" -r 10 -b:a 32k "{save_dir}{save_name}"')

for file in os.listdir(wait_handle_dir): 
	p.apply_async(handle_single, args=(file,)) # 对传入的file创建进程

print("用时:", time.time()-time_now)