线程和进程三--线程同步二

本文链接：https://blog.csdn.net/luofeng_/article/details/123605318

四、threading.Condition(lock=None)类

可传入一个Lock或Rlock对象，不传默认是RLock，Condition类方法：

acquire():获取锁
wait(timeout=None):可设置超时时间
notify(n=1)：唤醒至多指定数目个线程，没有等待的线程就不做任何操作
notify_all()：唤醒所有等待的线程

使用场景：
用于生产者、消费者模型，为了解决生产者和消费者速度匹配同步问题。
注意：
因为Condition内部默认使用了Rlock，因此必须先acquire，用完要release。最好使用with上下文。

理解：消费者wait；生产者生产好消息后，对消费者发通知，使用notify或notify_all方法

一个生产者对多个消费者时，实现了消息的一对多，这其实就是广播。
案例实现：
本例只做演示用，线程不安全.消费者再读的时候可能生产者又在生产新的

"""
@author: feng.luo
@time: 2022/3/18
@File: 2_thread_sync.py
"""

import datetime
import random
import threading
import time

logger = logging.basicConfig(level=logging.INFO)


class Dispatcher:
    """
	本例只做演示用，线程不安全.消费者再读的时候可能生产者又在生产新的
    """
    def __init__(self):
        # 生产者线程、消费者线程
        # 生产者生产数字完成后，叫醒消费者；消费者排队等待产品，进入等待池
        # 数字要被上锁
        self.con = threading.Condition()
        self.ev = threading.Event()
        self.num = None

    def produce(self):
        # while not self.ev.is_set():
        for _ in range(10):
            with self.con:
                self.num = random.randrange(0, 100)
                logger.info('{} make the num:{}'.format(threading.current_thread().name, self.num))
                self.con.notify(2)  # 生产者生产完成，通知等待池的线程取产品
            self.ev.wait(1)  # 模拟生产速度
        logger.info('{} product over'.format(threading.current_thread().name))

    def costumer(self):
        while not self.ev.is_set():
            with self.con:
                self.con.wait()  # 消费者排队等产品，进入等待线程池
                # 当生产者使用while语句时，此处线程不安全，例如，当线程进入wait池，但是在这之前，执行了self.ev.set()，则生产者不获取锁，直接停止生产，wait池中的线程一直等待产品,进入死锁
                logger.info("{} get the num:{}".format(threading.current_thread().name, self.num))
                self.num = None
            self.ev.wait(0.5)  # 模拟消费速度
        logger.info('{} consumer completed'.format(threading.current_thread().name))

    def set_ev(self):
        self.ev.set()
        logger.info('finish')
if __name__ == '__main__':
    disp = Dispatcher()
    threading.Thread(target=disp.produce, name='produce').start()
    threading.Thread(target=disp.costumer, name='costumer1').start()
    threading.Thread(target=disp.costumer, name='costumer2').start()
    time.sleep(3)
    disp.set_ev()

五、threading.Barrier(parties, action=None, timeout=None)类

栅栏、屏障。将线程分组，parities为一组，当达到parities个时，放行，循环下一组进入等待

n_waiting属性: 返回栅栏中处于等待的线程数
parities: 指定的等待数目
wait(timeout=None)方法:设置等待,返回0-parities数字。可以设置超时时间，超时后栅栏被打破abort，等待中的线程或调用等待方法的线程中，都会抛出BrokenBarrierError异常，直到reset方法恢复栅栏
broken属性：如果栅栏处于打破状态，返回True
abort()方法：打破栅栏
reset()方法：恢复栅栏

执行逻辑：
所有线程冲到barrier前等待，直到达到parities数目的线程，栅栏打开，所有线程继续执行。再有线程wait，继续循环。
例如：赛马比赛，所有马匹达到栅栏就位，开闸放马，再关闸。下一批马匹陆续就位后再开闸

应用场景：
并发初始化。所有线程都必须初始化完成后，才能继续工作。
例如：运行程序前，加载数据、检查，如果这些工作没完成，就开始运行将不能正常工作。又如：启动一个程序，需要先加载磁盘文件、缓存预热、初始化连接池等工作，这些工作齐头并进，不过只有都准备好了，程序才能继续向后执行；假设数据库连接失败，则初始化工作失败，就要abort，屏障broken，所有线程收到异常退出
案例实现：

"""
@author: feng.luo
@time: 2022/3/18
@File: 2_thread_sync.py
"""

import datetime
import random
import threading
import time

logger = logging.basicConfig(level=logging.INFO)


class BarrierDemo:
    @staticmethod
    def worker(barrier: threading.Barrier, x: int):
        logger.info('waiting for {} threads'.format(barrier.n_waiting))
        try:
            logger.info('Broken status:{}'.format(barrier.broken))
            if x < 3:
                barrier_id = barrier.wait(1)
            else:
                if x == 6:
                    barrier.reset()
                barrier_id = barrier.wait()
            logger.info('after barrier:{}'.format(barrier_id))
        except threading.BrokenBarrierError as e:
            logger.error('broken barrier, {}'.format(e))

    def run_barrier(self):
        barrier = threading.Barrier(3)
        for i in range(9):
            # if i == 2:
            #     barrier.abort()
            # elif i == 6:
            #     barrier.reset()
            threading.Event().wait(2)
            threading.Thread(target=self.worker, args=(barrier, i), name='worker_{}'.format(i)).start()


if __name__ == '__main__':
    BarrierDemo().run_barrier()

六、threading.Semaphore(value=1)类：

Semaphore信号量
类似锁，信号量对象内部维护一个倒计数器，每acquire一次，减1，当acquire方法发现计数器为0时，就阻塞请求的线程，直到其他线程对信号量release后，计数大于0，恢复阻塞的线程。

acquire()方法：获取信号量，计数器减1，获取成功返回True
release()方法：释放信号量，计数器加1

计数器永远不会低于0，acquire的时候发现是0，都会被阻塞

应用场景：
使用信号量semaphore解决资源有限的问题。典型应用：连接池

threading.BoundedSemaphore(value=1): 有界信号量
应用场景：
Semaphore类，在未acquire，直接release时，会超上界。BoundedSemaphore，不允许超出初始值范围，否则抛出ValueError异常
案例实现：创建连接池

"""
@author: feng.luo
@time: 2022/3/18
@File: 2_thread_sync.py
"""

import datetime
import random
import threading
import time

logger = logging.basicConfig(level=logging.INFO)


class SemaphoreDemo:
    @staticmethod
    def worker(s: threading.Semaphore):
        logger.info('in sub')
        s.acquire()
        logger.info('end sub')

    def run_s(self):
        s = threading.Semaphore(3)
        logger.info(s.acquire())
        logger.info(s.acquire())
        logger.info(s.acquire())
        logger.info('----------------')
        logger.info(s.acquire(False))
        logger.info(s.acquire(timeout=3))
        threading.Thread(target=self.worker, args=(s,)).start()
        s.release()


class Conn:
    def __init__(self, name):
        self.name = name


class ConnPoolLockDemo:
    """
    使用锁方法 创建一个连接池
    限制：
        池有最多连接数
        可以从池中取连接
        连接返回到池
        线程安全
    """
    def __init__(self, count):
        self.count = count
        self.lock = threading.Lock()
        self.pool = [Conn('pool_{}'.format(i)) for i in range(count)]

    def get_conn(self):
        self.lock.acquire()
        if self.pool:
            res_conn = self.pool.pop()
            self.lock.release()
            return res_conn
        return None

    def return_conn(self, conn):
        self.lock.acquire()
        if len(self.pool) < self.count:
            self.pool.append(conn)
            self.lock.release()
        else:
            logger.warning('exceed the threshold value: {}'.format(self.count))


class ConnPoolSemaphoreDemo:
    """
    信号量：
        threading.Semaphore(value=1):信号量，类似锁，信号量对象内部维护一个倒计数器，每acquire一次，减1，当acquire方法发现计数器为0时，
                                     就阻塞请求的线程，直到其他线程对信号量release后，计数大于0，恢复阻塞的线程。
            acquire()方法：获取信号量，计数器减1，获取成功返回True
            release()方法：释放信号量，计数器加1
            计数器永远不会低于0，acquire的时候发现是0，都会被阻塞
        应用场景：使用信号量semaphore解决资源有限的问题。
            实例：创建连接池
            池有最多连接数
            可以从池中取连接
            连接返回到池
            线程安全

        threading.BoundedSemaphore(value=1): 有界信号量
            应用：Semaphore类，在未acquire，直接release时，会超上界。BoundedSemaphore，不允许超出初始值范围，否则抛出ValueError异常

    信号量和锁：
        锁，只允许同一时间一个线程独占资源。它是特殊的信号量，即信号量计数器初值为1.
        信号量，允许多个线程访问共享资源，但这个共享资源数量有限
        锁可以看做特殊的信号量

    """
    def __init__(self, count):
        self.count = count
        self.sema = threading.Semaphore(3)
        self.pool = [Conn('pool_{}'.format(i)) for i in range(count)]

    def get_conn(self):
        self.sema.acquire()
        res_conn = self.pool.pop()
        return res_conn

    def return_conn(self, conn):
        self.pool.append(conn)
        self.sema.release()


class CanUesConnPoolSemaphoreDemo:
    def __init__(self, count: int):
        self.count = count
        self.sem = threading.BoundedSemaphore(count)
        self.conn_pool = [Conn('conn_{}'.format(i)) for i in range(count)]

    def get_conn(self):
        self.sem.acquire()
        return self.conn_pool.pop()

    def return_conn(self, conn):
        """
        异常场景：线程A/B/C都执行到第一句append，利用BoundedSemaphore超界抛ValueError异常，使用try...except...
        """
        try:
            self.conn_pool.append(conn)
            self.sem.release()
        except ValueError:
            self.conn_pool.pop(conn)


class RunPoolSemaphore:
    def __init__(self):
        self.pool = ConnPoolSemaphoreDemo(3)

    @staticmethod
    def semaphore_worker(pool: ConnPoolSemaphoreDemo):
        conn = pool.get_conn()
        logger.info(conn)
        # 模拟使用了一段时间
        threading.Event().wait(random.randint(1, 4))
        pool.return_conn(conn)

    def run_worker(self):
        for i in range(6):
            threading.Thread(target=self.semaphore_worker, args=(self.pool,)).start()


if __name__ == '__main__':
    SemaphoreDemo().run_s()
    RunPoolSemaphore().run_worker()

信号量和锁：

锁，只允许同一时间一个线程独占资源。它是特殊的信号量，即信号量计数器初值为1.
信号量，允许多个线程访问共享资源，但这个共享资源数量有限
锁可以看做特殊的信号量

数据结构和GIL

queue模块：
标准库queue模块，提供FIFO的queue、LIFO的队列、有限队列。
queue类是线程安全的，适用于多线程间安全的交换数据。内部使用了Lock和Condition
GIL全局解释器锁：
Cpython在解释器进程级别有一把锁，叫GIL全局解释器锁。
GIL保证Cpython进程中，同一时刻只有一个线程执行字节码。甚至在多核cpu的情况下，也是如此。因此Cpython中，严格意义上没有多线程，同一时刻只有一个线程。

Cpython中，多线程适用于IO密集型，由于线程阻塞，就会调度其他线程；
对于cpu密集型，当前线程可能会连续的获得GIL，导致其他线程几乎无法使用CPU，因为处于等待的线程重新激活，相比正在跑的线程，需要更多的时间，导致一直抢不到锁。

可见：IO密集型，使用多线程；CPU密集型，使用多进程，绕开GIL。

cpu密集型，验证cpython无多线程案例：

import datetime
import random
import threading
import logging

logger = logging.basicConfig(level=logging.INFO)


class MultiThreadEfficiency:
    def __init__(self):
        self.num = 1000000000

    @staticmethod
    def calc(num):
        res = 0
        for i in range(num):
            res += i

    def calc_thread_run(self):
        st = datetime.datetime.now()
        t1 = threading.Thread(target=self.calc, args=(self.num,))
        t2 = threading.Thread(target=self.calc, args=(self.num,))
        t3 = threading.Thread(target=self.calc, args=(self.num,))
        t4 = threading.Thread(target=self.calc, args=(self.num,))
        t5 = threading.Thread(target=self.calc, args=(self.num,))
        t1.start()
        t2.start()
        t3.start()
        t4.start()
        t5.start()
        t1.join()
        t2.join()
        t3.join()
        t4.join()
        t5.join()
        logger.info('thread time:{}'.format((datetime.datetime.now() - st).total_seconds()))

    def not_thread_calc(self):
        start = datetime.datetime.now()
        self.calc(self.num)
        self.calc(self.num)
        self.calc(self.num)
        self.calc(self.num)
        self.calc(self.num)
        logger.info('not thread, time:{}'.format((datetime.datetime.now() - start).total_seconds()))


if __name__ == '__main__':
    MultiThreadEfficiency().not_thread_calc()
    # not_thread_calc-INFO: not thread, time:385.343603
    MultiThreadEfficiency().calc_thread_run()
    # calc_thread_run-INFO: thread time:354.982785

什么是IO密集型，什么是CPU密集型？
IO密集型：写的程序大量访问网络、访问文件。
CPU密集型：写的程序大量的计算，就是CPU密集型。

由于GIL的存在，Cpython中，绝大多数内置数据结构的读写（append、add）都是原子操作，在多线程中都是线程安全的。但是，实际上它们本身不是线程安全的类型。

python线程同步总结

因为GIL全局解释器锁的存在，看到python内置数据结构读写都是原子操作，如果真的要实现线程安全，可以读queue原码，如何去加锁实现线程安全的

1. Event怎么用：
简单的wait，等一个状态的变化，就可以用Event。boss–worker杯子模型
2. Lock应用场景：
访问和修改同一个共享资源的时候，即读写同一个资源的时候；默认阻塞锁。RLock可重入锁。食堂窗口打饭模型
3. Barrier怎么用：
等等等，等大家都到齐了，并行初始化问题，就用barrier
4. Condition要怎么用：
做一对多通知，生产者消费者场景的时候，解决生产者—消费者速度不同步
5. Semaphore怎么用：
信号量，倒计数，资源池使用的时候；控制边界用BoundedSemaphore