基于线程的并行

最新推荐文章于 2021-08-28 22:40:54 发布

wAIxiSeu

最新推荐文章于 2021-08-28 22:40:54 发布

阅读量351

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/helianxiaoye/article/details/109820031

版权

python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

link：

基于线程的并行
threading — 基于线程的并行

线程和进程

进程，是计算机中已运行程序的实体。程序本身只是指令、数据及其组织形式的描述，进程才是程序的真正运行实例
操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位
一个进程中可以并发多个线程，每条线程并行执行不同的任务
线程CPU的最小调度单元，所以单进程多线程也可以利用多核CPU
多线程利用共享内存空间来实现线程之间共享数据和资源
每个线程包含: 程序计数器、寄存器和栈，与同一进程的其他线程共享的资源基本上包括数据和系统资源
每个线程有自己的运行状态，可以和其他线程同步，大体上可以分为ready,running,blocked

python 标准库中的 threading

这个模块在较低级的模块 _thread 基础上建立较高级的线程接口

主要的方法

threading.active_count()
- 返回当前存活的线程类 Thread 对象。返回的计数等于 enumerate() 返回的列表长度。
threading.current_thread()
- 返回当前对应调用者的控制线程的 Thread 对象。如果调用者的控制线程不是利用 threading 创建，会返回一个功能受限的虚拟线程对象。
threading.get_ident()
- 返回当前线程的 “线程标识符”。它是一个非零的整数。它的值没有直接含义，主要是用作 magic cookie，比如作为含有线程相关数据的字典的索引。线程标识符可能会在线程退出，新线程创建时被复用。

线程对象

The Thread class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.

创建线程的两种方法:
- 给构造方法传递一个可调用对象(函数创建)
  - Thread(name=“a thread name”, target=callable_object, args=())
- 继承Thread类并复写子类的run方法
  - 子类中只有__init__和run方法需要复写
通过调用start方法启动线程, 会默认调用run方法
- 这跟JAVA中的类似
一个线程可以被标记成一个“守护线程”。这个标志的意义是，当剩下的线程都是守护线程时，整个 Python 程序将会退出。初始值继承于创建线程。这个标志可以通过 daemon 特征属性或者 daemon 构造器参数来设置。
常用方法
- start
- run
- join
  - 等待，直到线程终结。这会阻塞调用这个方法的线程，直到被调用 join() 的线程终结 – 不管是正常终结还是抛出未处理异常 – 或者直到发生超时，超时选项是可选的。
  - 一个线程可以被 join() 很多次。
  - 如果尝试加入当前线程会导致死锁， join() 会引起 RuntimeError 异常。如果尝试 join() 一个尚未开始的线程，也会抛出相同的异常。
- getName
- ident
- is_alive

实现一个线程

def work(param):
    print("I am a worker with param: {}, thread id: {}".format(param, threading.get_ident()))
    return "success."

def simple_thread():
    for i in range(5):
        t = threading.Thread(target=work, args=("hello world",))
        t.start()

python中多线程同步

python通过标准库threading来管理线程
主要模块包括:
- 线程对象
- Lock对象
- RLock对象
- 信号对象(Semaphore)
- 条件对象(Condition)
  - 是一个比Lock和RLock高级的锁
- 事件对象(Event)

使用锁进行线程同步

原始锁是一个在锁定时不属于特定线程的同步基元组件。在Python中，它是能用的最低级的同步基元组件，由 _thread 扩展模块直接实现。

Lock
RLock:
- 可重入锁(or 递归锁), 同一个线程可以多次获得锁, 即可以多次acquire()
- acquire()多少次就必须release()多少次, 最后一次才会真正释放锁
- 谁获取的锁，必须由谁释放

Lock使用示例

import threading
# 2. 使用lock进行线程同步
lock = threading.Lock()
counter = 0
epoch = 100


def increment_with_lock():
    global counter
    for i in range(epoch):
        lock.acquire()
        counter += 1
        print("current thread is: {}, counter is: {}".format(threading.current_thread().getName(), counter))
        lock.release()


def decrement_with_lock():
    global counter
    for i in range(epoch):
        lock.acquire()
        counter -= 1
        print("current thread is: {}, counter is: {}".format(threading.current_thread().getName(), counter))
        lock.release()


def lock_demo():
    t1 = threading.Thread(name="increment", target=increment_with_lock)
    t2 = threading.Thread(name="decrement", target=decrement_with_lock)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print("finish with counter is: {}".format(counter))

Lock 和 RLock简单对比

import threading
lock = threading.Lock() #Lock对象  
lock.acquire()  
lock.acquire()  #产生了死琐。  
lock.release()  
lock.release()  

rLock = threading.RLock()  #RLock对象  
rLock.acquire()  
rLock.acquire() #在同一线程内，程序不会堵塞。  
rLock.release()  
rLock.release()

使用信号量同步

信号量常用于对共享资源的互斥访问, 对应底层的 P,V 原语

semaphore = threading.Semaphore(0)
item = 0


# 3. 使用信号量同步
def consumer():
    global item
    # semaphore.acquire()
    print("consume item is: {}".format(item))
    time.sleep(0.1) # 模拟耗时操作


def producer():
    global item
    time.sleep(0.1) # 模拟耗时操作
    item = random.randint(0, 10)
    print("produce item is: {}".format(item))
    # semaphore.release()


def semaphore_demo():
    for i in range(10):
        t1 = threading.Thread(target=producer)
        t2 = threading.Thread(target=consumer)
        t1.start()
        t2.start()
        t1.join()
        t2.join()
        print("---"*10)

输出

consume item is: 0
produce item is: 0
------------------------------
consume item is: 0
produce item is: 6
------------------------------
consume item is: 6
produce item is: 9
------------------------------
consume item is: 9
produce item is: 8
------------------------------
consume item is: 8
produce item is: 5
------------------------------
consume item is: 5
produce item is: 7
------------------------------
consume item is: 7
produce item is: 6
------------------------------
consume item is: 6
produce item is: 7
------------------------------
consume item is: 7
produce item is: 4
------------------------------
consume item is: 4
produce item is: 2
------------------------------

使用semaphore之后的输出

produce item is: 7
consume item is: 7
------------------------------
produce item is: 7
consume item is: 7
------------------------------
produce item is: 1
consume item is: 1
------------------------------
produce item is: 4
consume item is: 4
------------------------------
produce item is: 5
consume item is: 5
------------------------------
produce item is: 2
consume item is: 2
------------------------------
produce item is: 1
consume item is: 1
------------------------------
produce item is: 8
consume item is: 8
------------------------------
produce item is: 3
consume item is: 3
------------------------------
produce item is: 0
consume item is: 0
------------------------------

分析

所谓同步即要求线程之间按照一定的顺序运行
以上述生产者消费者模型为例，要求生产者必须在消费者之前运行
如果不使用semaphore同步，就会出现消费者比生产者先运行
使用semaphore同步之后，能够保证每一次生产消费的过程(对应一个for循环)中，生产者总能先于消费者运行

使用条件进行线程同步

条件指的是应用程序状态的改变。这是另一种同步机制，其中某些线程在等待某一条件发生，其他的线程会在该条件发生的时候进行通知。一旦条件发生，线程会拿到共享资源的唯一权限。
重点: 条件发生, 发出通知, 否则一直等待

condition = threading.Condition()
item_list = []


class Consumer(threading.Thread):
    def do(self):
        condition.acquire()
        if len(item_list) == 0:
            # 缓冲区空
            condition.wait()
            print("buffer is blank.")
        t = item_list.pop()
        print("consume item: {}".format(t))
        condition.notify()
        condition.release()

    def run(self) -> None:
        for i in range(10):
            time.sleep(0.2)
            self.do()


class Producer(threading.Thread):
    def do(self):
        condition.acquire()
        if len(item_list) == 3:
            # 缓冲区满
            condition.wait()
            print("buffer is full.")
        t = random.randint(0, 10)
        item_list.append(t)
        print("produce item: {}".format(t))
        condition.notify()
        condition.release()

    def run(self) -> None:
        for i in range(10):
            # time.sleep(0.2)
            self.do()


def condition_demo():
    p = Producer()
    c = Consumer()
    p.start()
    c.start()
    p.join()
    c.join()

输出

produce item: 8
produce item: 10
produce item: 4
consume item: 4
buffer is full.
produce item: 9
consume item: 9
buffer is full.
produce item: 7
consume item: 7
buffer is full.
produce item: 1
consume item: 1
buffer is full.
produce item: 2
consume item: 2
buffer is full.
produce item: 10
consume item: 10
buffer is full.
produce item: 6
consume item: 6
buffer is full.
produce item: 9
consume item: 9
consume item: 10
consume item: 8

分析

必须先notify()再release()

If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.

本例中条件就是缓冲区状态，当满足一定状态时，发出通知，不满足时等待

使用事件进行同步

事件是线程之间用于通讯的对象。有的线程等待信号，有的线程发出信号。基本上事件对象都会维护一个内部变量，可以通过 set() 方法设置为 true ，也可以通过 clear() 方法设置为 false 。 wait() 方法将会阻塞线程，直到内部变量为 true 。

import time
from threading import Thread, Event
import random
items = []
event = Event()

class consumer(Thread):
    def __init__(self, items, event):
        Thread.__init__(self)
        self.items = items
        self.event = event

    def run(self):
        while True:
            time.sleep(2)
            self.event.wait()
            item = self.items.pop()
            print('Consumer notify : %d popped from list by %s' % (item, self.name))

class producer(Thread):
    def __init__(self, items, event):
        Thread.__init__(self)
        self.items = items
        self.event = event

    def run(self):
        global item
        for i in range(100):
            time.sleep(2)
            item = random.randint(0, 256)
            self.items.append(item)
            print('Producer notify : item N° %d appended to list by %s' % (item, self.name))
            print('Producer notify : event set by %s' % self.name)
            self.event.set()
            print('Produce notify : event cleared by %s '% self.name)
            self.event.clear()

if __name__ == '__main__':
    t1 = producer(items, event)
    t2 = consumer(items, event)
    t1.start()
    t2.start()
    t1.join()
    t2.join()

Condition 和 Event 区别

参考: threading.Condition vs threading.Event

Simply put, you use a Condition when threads are interested in waiting for something to become true, and once its true, to have exclusive access to some shared resource.

Whereas you use an Event when threads are just interested in waiting for something to become true.

In essence, Condition is an abstracted Event + Lock, but it gets more interesting when you consider that you can have several different Conditions over the same underlying lock. Thus you could have different Conditions describing the state of the underlying resource meaning you can wake workers that are only interested in particular states of the shared resource.

Condition: 当等待某件事变为True, 并且一旦变成True之后，就会锁住共享资源
Event: 只对等待某个事件变为True感兴趣
Condition ≈ Event + Lock
在同一个锁下面可以有多个condition, 用于描述共享资源的多种状态

python中多线程通信

共享变量

多线程之间能够访问全局变量，但是要注意竞争问题

使用queue

使用标准库中的queue库，该库是一个线程安全的实现

其他

使用with

在threading模块中，所有带有 acquire() 方法和 release() 方法的对象都可以使用上下文管理器。

定时器对象

此类表示一个操作应该在等待一定的时间之后运行 — 相当于一个定时器。 Timer 类是 Thread 类的子类，因此可以像一个自定义线程一样工作。

# 5. 定时器, 用于在指定时间后调用一个方法
def timer_demo():
    timer = threading.Timer(1.0, work, args=("hello",))
    timer.start()
    # timer.cancel()

栅栏对象

栅栏类提供一个简单的同步原语，用于应对固定数量的线程需要彼此相互等待的情况。线程调用 wait() 方法后将阻塞，直到所有线程都调用了 wait() 方法。此时所有线程将被同时释放。
栅栏对象可以被多次使用，但进程的数量不能改变。

主要方法

wait
- 冲出栅栏。当栅栏中所有线程都已经调用了这个函数，它们将同时被释放。如果提供了 timeout 参数，这里的 timeout 参数优先于创建栅栏对象时提供的 timeout 参数。
- 如果发生了超时，栅栏对象将进入破损态。
- 如果栅栏对象进入破损态，或重置栅栏时仍有线程等待释放，将会引发 BrokenBarrierError 异常。
abort
- 使栅栏进入破损态。这将导致所有已经调用和未来调用的 wait() 方法中引发 BrokenBarrierError 异常。使用这个方法的一种情况是需要中止程序以避免死锁。
reset
- 重置栅栏为默认的初始态。如果栅栏中仍有线程等待释放，这些线程将会收到 BrokenBarrierError 异常。
broken
- 一个布尔值，值为 True 表明栅栏为破损态。


# 6. 栅栏对象
def callback():
    print("冲破栅栏")


barrier = threading.Barrier(2, callback)


class BarrierDemo(threading.Thread):
    def run(self) -> None:
        for _ in range(3):
            time.sleep(1)
            print(f"I am {self.getName()}")
            try:
                barrier.wait()
            except threading.BrokenBarrierError:
                print("broken barrier")
        barrier.abort()


def barrier_demo():
    b1 = BarrierDemo(name="b1")
    b2 = BarrierDemo(name="b2")
    b3 = BarrierDemo(name="b3")
    b1.start()
    b2.start()
    b3.start()
    b1.join()
    b2.join()
    b3.join()

wAIxiSeu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
基于线程的并行

link：基于线程的并行threading — 基于线程的并行线程和进程进程，是计算机中已运行程序的实体。程序本身只是指令、数据及其组织形式的描述，进程才是程序的真正运行实例操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位一个进程中可以并发多个线程，每条线程并行执行不同的任务线程CPU的最小调度单元，所以单进程多线程也可以利用多核CPU多线程利用共享内存空间来实现线程之间共享数据和资源每个线程包含: 程序计数器、寄存器和栈，与同一进程的其他线程共享的.
复制链接

扫一扫

专栏目录