线程与进程详解(Python编程)

最新推荐文章于 2022-07-26 17:14:10 发布

Hogan180

最新推荐文章于 2022-07-26 17:14:10 发布

阅读量287

点赞数

分类专栏： Python 操作系统与计算机组成原理文章标签： Python 爬虫进程线程

本文链接：https://blog.csdn.net/weixin_40586929/article/details/97613813

版权

Python 同时被 2 个专栏收录

50 篇文章 2 订阅

订阅专栏

操作系统与计算机组成原理

6 篇文章 0 订阅

订阅专栏

在项目开发中，无论是提高程序运行效率还是打造分布式编程，都离不开线程与进程的身影。使用多进程、多线程可以提高程序处理的效率。

多进程

在Python中，创建多进程通常是使用 multiprocessing模块。在multiprocessing中，提供了一个Process类来描述一个进程对象，通过这一个类即可完成对进程的创建。

import os
from multiprocessing import Process


def run(id):
    print('Child process %s / %s is running...' % (id, os.getpid()))


if __name__ == '__main__':
    for i in range(5):
        p = Process(target = run, args = (str(i), ))
        print('now is starting...')
        p.start()
    p.join()
    print('Process  end...')

但是无论是多进程还是多线程，运行的顺序都不是按照程序执行编写的顺序执行，这个之后会讲解。

倘若我们需要创建指定数量的进程给用户，就可以使用Pool类。Pool可以提供用户指定数量的进程，通过Pool，就相当于创建一个池，池中放置一定数量的进程，当用户需要使用的时候，就往里头调取就可以了。

from multiprocessing import Pool
import  os
import random
import time

def run(id):
    print('Task  %s (pid = %s )is running' % (id ,os.getpid()))
    time.sleep(random.random() * 3)
    print('Task %s end' % id)

if __name__ == '__main__':
    print('Current process %s' %os.getpid())
    p = Pool(processes = 5)
    for i in range(10):
        p.apply_async(run, args = (i,),)
    print('waiting done...')
    p.close()
    p.join()
    print('all done...')

当我们创建了多个进程时，那么它们之间的通信就必不可少。首先我们介绍Queue队列，它是多进程安全的队列，队列拥有先进先出的特点。我们可以通过使用Put，Get对Queue进行操作。它们都有两个可选参数：timeout，即方法要等待的时间；以及blocked，这个是阻塞的标志位。当blocked为True，并且timeout大于零时，put或者get方法就会阻塞一定时间，直到队列有剩余的空间，否则当没有空间剩余的时候就会抛出Queue.Full异常。

from multiprocessing import Process, Queue
import os
import time
import random

def writeProcess(q, tasks):
    print('Process %s is writing...' % os.getpid())
    for t in tasks:
        q.put(t)
        print('Put %s to queue...' % os.getpid())
        time.sleep(random.randint(1, 5))

def readProcess(q):
    print('Process %s is reading...' % os.getpid())
    while True:
        t = q.get(True)
        print('get %s from queue' % os.getpid())


if __name__ == '__main__':
    q = Queue()
    proc_write1 = Process(target = writeProcess, args = (q,['t1', 't2', 't3']))
    proc_write2 = Process(target=writeProcess, args=(q, ['t4', 't5', 't6']))
    proc_read = Process(target = readProcess, args = (q,))
    proc_write1.start()
    proc_write2.start()
    proc_read.start()
    proc_write1.join()
    proc_write2.join()
    proc_read.terminate()#强制停止

程序最后为什么会使用到terminate()这个方法呢，因为读取的函数是一个死循环函数，无法自动停止，所以只能通过这个函数来强制停止。

以上程序另外提及random的用法：

random.random
用于生成一个0到1的随机符点数: 0 <= n < 1.0

random.uniform
该函数的原型为random.uniform(a, b)，用于生成一个指定范围内的随机符点数，两个参数其中一个是上限，一个是下限。如果a > b，则生成的随机数n: a <= n <= b。如果 a <b， 则 b <= n <= a。

random.randint(a, b)，同上面原型参数一样，用于生成一个指定范围内的整数。其中参数a是下限，参数b是上限，生成的随机数n: a <= n <= b

random.choice从序列中获取一个随机元素。其函数原型为：random.choice(sequence)。参数sequence表示一个有序类型，比如列表，元组，字符串。

介绍完队列通信机制后现在来介绍管道通信。管道是双通道通信机制，双通道即两端都可以进行读或者写操作。Pipe方法中有duplex参数，它是来控制管道的两侧是均可进行读写还是一侧只负责接收消息，另一侧只负责发送消息。当duplex为False时，就是一侧只负责接收消息，另一侧只负责发送消息。

import multiprocessing
import random
import time
import os

def proc_send(pipe, names):
    for name in names:
        print('Process %s send %s:' %(os.getpid(), name))
        pipe.send(name)
        time.sleep(random.uniform(1,5))

def proc_recv(pipe):
    while True:
        print('Process %s recv : %s' %(os.getpid(), pipe.recv()))
        time.sleep(random.uniform(1, 5))

if __name__ == '__main__':
    pipe = multiprocessing.Pipe()
    p1 = multiprocessing.Process(target = proc_send, args = (pipe[0], ['one', 'two', 'three']))
    p2 = multiprocessing.Process(target = proc_recv, args = (pipe[1],))
    p1.start()
    p2.start()
    p1.join()
    p2.terminate()

多线程

类似多进程，在Python中，一般是通过继承threading.Thread这个类来创建我们所需要的线程类。

多线程就类似同时执行多个不同的程序，通过多线程，可以把运行时间长的任务放到后台处理；同时，在一些需要等待的任务上，我们可以释放内存占用这一些珍贵的资源。比如Io操作，网络收发数据。

import threading
mylock = threading.RLock()
num = 0
class myThread(threading.Thread):
    def __init__(self, name):
        threading.Thread.__init__(self, name = name)

    def run(self):
        global num
        while True:
            mylock.acquire()
            print('%s loscked, number: %d' % (threading.current_thread().name, num))
            if num > 4:
                mylock.release()
                print('%s release, number : %d' % (threading.current_thread().name, num))
                break
            num += 1
            mylock.release()
            print('%s release, number : %d' % (threading.current_thread().name, num))


if __name__ == '__main__':
    t1 = myThread('Thread1')
    t2 = myThread('Thread2')
    t1.start()
    t2.start()

在这里是我们使用锁来对线程进行线程同步操作。如果多个线程同时操作一个数据时，不同步的话则可能出现不可预料的结果。在Lock中，有acquire和release两个方法，一个acquire对应一个release。当一个线程同时进行两个acquire操作时，因为第一次的acquire没有释放掉，所以无法进行第二次的acquire。而RLock对象允许一个线程同时进行多次acquire，而count这个数目在创建RLock时就要确认好。但是要注意的是，每一次的acquire都对应着一个release，，在所有的release完成后，别的线程才可以申请这个RLock对象。

最后介绍一下进程的死锁

死锁的特征：

当发生死锁时，那么进程永远不能完成当前的任务，那么系统的资源就会被阻碍使用，以致于阻止了其他作业开始执行。

死锁发生的必要条件，我们强调所有四个条件必须同时成立才会出现死锁：

如果在一个系统中以下四个条件同时成立，那么就能引起死锁：

互斥：至少有一个资源必须处于非共享模式，即一次只有一个进程可使用。如果另一进程申请该资源，那么申请进程应等到该资源释放为止。
占有并等待：—个进程应占有至少一个资源，并等待另一个资源，而该资源为其他进程所占有。
非抢占：资源不能被抢占，即资源只能被进程在完成任务后自愿释放。
循环等待：有一组等待进程 {P0，P1，…，Pn}，P0 等待的资源为 P1 占有，P1 等待的资源为 P2 占有，……，Pn-1 等待的资源为 Pn 占有，Pn 等待的资源为 P0 占有。