python 彻底理解线程

不再游移

已于 2023-06-12 17:51:09 修改

阅读量216

点赞数

文章标签： python

于 2023-06-12 17:12:30 首次发布

本文链接：https://blog.csdn.net/m0_63174581/article/details/131172871

版权

一、线程

多线程和多进程都可以执行多个任务,线程是进程的一部分.线程的特点是线程之间可以共享内存和变量,资源消耗少,缺点是线程之间的同步和加锁比较麻烦

在使用多线程的应用下，如何保证线程安全，以及线程之间的同步，或者访问共享变量等问题是十分棘手的问题，也是使用多线程下面临的问题，如果处理不好，会带来较严重的后果，使用python多线程中提供Lock 、Rlock 、Semaphore 、Event 、Condition 用来保证线程之间的同步，后者保证访问共享变量的互斥问题。

join( )函数：例子：主线程A中，创建了子线程B，并且在主线程A中调用了B.join()，

那么，主线程A会在调用的地方阻塞，直到子线程B完成操作后，才可以接着往下执行。

setDaemon():

例子：主线程A中，创建了子线程B，并且在主线程A中调用了B.setDaemon(),

即：把主线程A设置为守护线程，这时候，要是主线程A执行结束了，就不管子线程B是否完成,一并和主线程A退出。

注意：必须在start() 方法调用之前设置，如果不设置为守护线程，程序会被无限挂起。

死锁的解决办法:

在线程间共享多个资源时，A要等B释放，B要等A释放

（1）避免死锁-银行家算法

A解锁导致B解锁，B解锁导致C解锁，C解锁导致D解锁

（2）添加超时时间

二、Lock & Rlock

1.Lock(原始锁)

请求锁定->进入锁定池等待->获取锁-已锁定-释放锁

原始锁是一个在锁定时不属于特定线程的同步基元组件.Lock有一个锁定池,当线程请求锁定时,将线程至于池中,直到获得锁定后出池,池中的线程处于同步阻塞状态.锁支持上下文管理协议,即支持with语句.

对于Lock对象而言，如果一个线程连续两次release，使得线程死锁。所以Lock不常用，一般采用Rlock进行线程锁的设定。

2.可重入锁Rlock

Rlock是一个可以被同一个线程请求多次的同步指令.Rlock使用了"拥有的线程"和"递归等级"的概念,处于锁定状态时,Rlock被某个线程拥有.拥有Rlock的线程可以再次调用acquire(), 释放锁时需要调用release()相同次数.可以认为RLock包含一个锁定池和一个初始值为0的计数器，每次成功调用 acquire()/release()，计数器将+1/-1，为0

Lock在锁定时不属于特定线程,也就是说,Lock可以在一个线程中上锁,在另一个线程中解锁.而对于RLock来说.只有当前线程才能释放本线程上的锁.

import threading
import time

lock1 = threading.RLock()

def inner():
    with lock1:
        print("inner1 function:%s" % threading.current_thread())

def outer():
    print("outer function:%s" % threading.current_thread())
    with lock1:
        inner()

if __name__ == "__main__":
    t1 = threading.Thread(target=outer)
    t2 = threading.Thread(target=outer)
    t1.start()
    t2.start()
    
    ----------------------
outer function:<Thread(Thread-1, started 139715906574080)>
inner1 function:<Thread(Thread-1, started 139715906574080)>
outer function:<Thread(Thread-2, started 139715914966784)>
inner1 function:<Thread(Thread-2, started 139715914966784)>

三、使用信号量进行线程同步

信号量:

简介:信号量是操作系统管理的一种抽象数据类型,用于多线程对共享资源的使用.本质上说,信号量是一个内部数据,用于标明当前的共享资源可以有多少并发读取.

在threading模块中,信号量的操作有两个函数.即acuqire()和release():

每当线程想要读取关联了信号量的共享资源时,必须调用acquire(),此操作减少信号量的内部变量,如果此变量的值非负,那么分配该资源的权限.如果是负值,那么线程被挂起,直到有其他线程释放资源.
当线程不再需要该共享资源,必须通过release()释放.这样,信号量的内部变量增加,在信号量等待队列中排在最前面的线程会拿到共享资源的权限.

虽然表面上看信号量机制没什么明显的问题，如果信号量的等待和通知操作都是原子的，确实没什么问题。但如果不是，或者两个操作有一个终止了，就会导致糟糕的情况。

举个例子，假设有两个并发的线程，都在等待一个信号量，目前信号量的内部值为1。假设第线程A将信号量的值从1减到0，这时候控制权切换到了线程B，线程B将信号量的值从0减到-1，并且在这里被挂起等待，这时控制权回到线程A，信号量已经成为了负值，于是第一个线程也在等待。

Semaphore的value参数表示内部计数器的初始值，默认值为０。信号量内部有个计数器，它的计算方式：release()调用数减去acquire()调用数加上一个初始值。例如上边给的初始值为0，release()调用１次，acquire()调用５次．计算值为－４，则acquire()方法将会阻塞到可以返回时的计数器不是负数，也就是只有第一个线程返回为０。

这样的话，尽管当时的信号量是可以让线程访问资源的，但是因为非原子操作导致了所有的线程都在等待状态。

import threading
import time
import random

# The optional argument gives the initial value for the internal
# counter;
# it defaults to 1.
# If the value given is less than 0, ValueError is raised.
semaphore = threading.Semaphore(0)

def consumer():
        print("consumer is waiting.")
        # Acquire a semaphore
        semaphore.acquire()
        # The consumer have access to the shared resource
        print("Consumer notify : consumed item number %s " % item)

def producer():
        global item
        time.sleep(10)
        # create a random item
        item = random.randint(0, 1000)
        print("producer notify : produced item number %s" % item)
         # Release a semaphore, incrementing the internal counter by one.
        # When it is zero on entry and another thread is waiting for it
        # to become larger than zero again, wake up that thread.
        semaphore.release()

if __name__ == '__main__':
        for i in range (0,5) :
                t1 = threading.Thread(target=producer)
                t2 = threading.Thread(target=consumer)
                t1.start()
                t2.start()
                t1.join()
                t2.join()
        print("program terminated")

信号量的一个特殊用法是互斥量。互斥量是初始值为1的信号量，可以实现数据、资源的互斥访问。

信号量在支持多线程的编程语言中依然应用很广，然而这可能导致死锁的情况。例如，现在有一个线程t1先等待信号量s1，然后等待信号量s2，而线程t2会先等待信号量s2，然后再等待信号量s1，这样就可能会发生死锁，导致t1等待s2，但是t2在等待s1。

import threading


class ZeroEvenOdd:
    def __init__(self, n):
        self.n = n
        self.s1 = threading.Semaphore(1)
        self.s2 = threading.Semaphore(0)
        self.s3 = threading.Semaphore(0)

    def zero(self, printNumber: 'Callable[[int], None]') -> None:
        for i in range(1, self.n + 1):
            self.s1.acquire()
            printNumber(0)
            if i % 2 == 0:
                self.s2.release()
            else:
                self.s3.release()

    def even(self, printNumber: 'Callable[[int], None]') -> None:
        for i in range(1, self.n + 1):
            if i % 2 == 0:
                self.s2.acquire()
                printNumber(i)
                self.s1.release()

    def odd(self, printNumber: 'Callable[[int], None]') -> None:
        for i in range(1, self.n + 1):
            if i % 2 == 1:
                self.s3.acquire()
                printNumber(i)
                self.s1.release()

四、Condition

Condition的处理流程如下：

首先acquire一个条件变量，然后判断一些条件。
如果条件不满足则wait；
如果条件满足，进行一些处理改变条件后，通过notify方法通知其他线程，其他处于wait状态的线程接到通知后会重新判断条件。
不断的重复这一过程，从而解决复杂的同步问题。

Condition的基本原理如下：

Condition对象维护了一个锁（Lock/RLock)和一个waiting池。线程通过acquire获得Condition对象，当调用wait方法时，线程会释放Condition内部的锁并进入blocked状态，同时在waiting池中记录这个线程。当调用notify方法时，Condition对象会从waiting池中挑选一个线程，通知其调用acquire方法尝试取到锁。

Condition对象的构造函数可以接受一个Lock/RLock对象作为参数，如果没有指定，则Condition对象会在内部自行创建一个RLock。

除了notify方法外，Condition对象还提供了notifyAll方法，可以通知waiting池中的所有线程尝试acquire内部锁。由于上述机制，处于waiting状态的线程只能通过notify方法唤醒，所以notifyAll的作用在于防止有的线程永远处于沉默状态。

import threading
import time

class Producer(threading.Thread):
    # 生产者函数
    def run(self):
        global count
        while True:
            if con.acquire():
                # 当count 小于等于1000 的时候进行生产
                if count > 1000:
                    con.wait()
                else:n.wait()
                    count = count+100
                    msg = self.name+' produce 100, count=' + str(count)
                    print(msg)
                    # 完成生成后唤醒waiting状态的线程，
                    # 从waiting池中挑选一个线程，通知其调用acquire方法尝试取到锁
                    con.notify()
                con.release()
                time.sleep(1)

class Consumer(threading.Thread):
    # 消费者函数
    def run(self):
        global count
        while True:
            # 当count 大于等于100的时候进行消费
            if con.acquire():
                if count < 100:
                    con.wait()
                
                else:
                    count = count-5
                    msg = self.name+' consume 5, count='+str(count)
                    print(msg)
                    con.notify()
                    # 完成生成后唤醒waiting状态的线程，
                    # 从waiting池中挑选一个线程，通知其调用acquire方法尝试取到锁
                con.release()
                time.sleep(1)

count = 500
con = threading.Condition()

def test():
    for i in range(2):
        p = Producer()
        p.start()
    for i in range(5):
        c = Consumer()
        c.start()
if __name__ == '__main__':
    test()

五、Event类

很多时候,线程之间会有互相通信的需要.常见的情形是次要线程为主要线程执行特地的任务,在执行过程中需要不断报告执行的进度情况.

threading.Event可以使一个线程等待其他线程的通知.其内置了一个标志,初始值为False.线程通过wait()方法进入等待状态,直到另一个线程调用set()方法将内置标志设置为True时,Event通知所有等待状态的线程恢复运行; 调用clear()时重置为False. 还可以通知isSet()方法查询Event对象内置状态的当前值.

Event其实就是一个简化版的Condition. Event没有锁,无法使线程进入同步阻塞状态.

isSet(): 当内置标志为True时返回True.
set(): 将标志设为True, 并通知所有处于等待阻塞状态的线程恢复运行状态.
clear(): 将标志设为False.

wait([timeout]): 如果标志为True将立即返回,否则阻塞线程至等待阻塞状态,等待其他线程调用set()

import threading
import time
event = threading.Event()
def func():
    # 等待事件，进入等待阻塞状态
    print('%s wait for event...' % threading.currentThread().getName())
    event.wait()
    # 收到事件后进入运行状态
    print('%s recv event.' % threading.currentThread().getName())
t1 = threading.Thread(target=func)
t2 = threading.Thread(target=func)
t1.start()
t2.start()

time.sleep(2)

# 发送事件通知
print('MainThread set event.')
event.set()
----------------------------------------------------
Thread-6 wait for event...Thread-7 wait for event...

MainThread set event.
Thread-6 recv event.
Thread-7 recv event.

六、Time类

Timer（定时器）是Thread的派生类，用于在指定时间后调用一个方法。Timer从Thread派生，没有增加实例方法。
函数：Timer(interval, function, args=[ ], kwargs={ })
interval: 指定的时间
function: 要执行的方法
args/kwargs: 方法的参数
import threading


def func(num):import threading


def func(num):
    print('hello {} timer!'.format(num))

# 如果t时候启动的函数是含有参数的，直接在后面传入参数元组
timer = threading.Timer(5, func,(1,))
time0 = time.time()
timer.start()
print(time.time()-time0)
    print('hello {} timer!'.format(num))

# 如果t时候启动的函数是含有参数的，直接在后面传入参数元组
timer = threading.Timer(5, func,(1,))
time0 = time.time()
timer.start()
print(time.time()-time0)
------------------------------------------------
0.0
hello 1 timer!



from threading import Timer

def fun():
    print "hello, world"0

if __name__=='__main__':
    t = Timer(5.0, fun)
    t.start() # 开始执行线程，但是不会打印"hello, world"
    t.cancel() # 因为cancel取消了线程的执行，所以fun()函数不会被执行

七、TLS机制

Thread Local Storage（线程局部存储）

Threadlocal的实现原理类似有一个全局的词典，词典的key是线程id，value就是共享的全局变量的副本。每次访问全局变量的时候，你访问到的其实是副本，只是Python使用黑魔法帮我们屏蔽了这个
userName.val 的访问细节，其实他访问的是词典中的对应线程所拥有的对象副本。

ThreadLocal真正做到了线程之间的数据隔离，并且不需要手动获取自己的线程ID

在多线程环境下，每个线程都有自己的数据。一个线程使用自己的局部变量比使用全局变量好，因为局部变量只有线程自己能看见，不会影响其他线程，而全局变量的修改必须加锁。
但是局部变量也有问题，就是在函数调用的时候，传递起来很麻烦：
def process_student(name):
  std = Student(name)
  # std是局部变量，但是每个函数都要用它，因此必须传进去：
  do_task_1(std)
  do_task_2(std)
 
def do_task_1(std):
  do_subtask_1(std)
  do_subtask_2(std)
 
def do_task_2(std):
  do_subtask_2(std)
  do_subtask_2(std)

每个函数一层一层调用都这么传参数那还得了？用全局变量？也不行，因为每个线程处理不同的Student对象，不能共享。
如果用一个全局dict存放所有的Student对象，然后以thread自身作为key获得线程对应的Student对象如何？
global_dict = {}
 
def std_thread(name):
  std = Student(name)
  # 把std放到全局变量global_dict中：
  global_dict[threading.current_thread()] = std
  do_task_1()
  do_task_2()
 
def do_task_1():
  # 不传入std，而是根据当前线程查找：
  std = global_dict[threading.current_thread()]
  ...
 
def do_task_2():
  # 任何函数都可以查找出当前线程的std变量：
  std = global_dict[threading.current_thread()]
  ...

这种方式理论上是可行的，它最大的优点是消除了std对象在每层函数中的传递问题，但是，每个函数获取std的代码有点丑。
有没有更简单的方式？
ThreadLocal应运而生，不用查找dict，ThreadLocal帮你自动做这件事：
import threading
 
# 创建全局ThreadLocal对象:
local_school = threading.local()
 
def process_student():
  print 'Hello, %s (in %s)' % (local_school.student, threading.current_thread().name)
 
def process_thread(name):
  # 绑定ThreadLocal的student:
  local_school.student = name
  process_student()
 
t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

ThreadLocal最常用的地方就是为每个线程绑定一个数据库连接，HTTP请求，用户身份信息等，这样一个线程的所有调用到的处理函数都可以非常方便地访问这些资源。
from weakref import ref  # ref用在了构造大字典元素元组的第一个位置即 (ref(Thread), 线程字典)
from contextlib import contextmanager  # 上下文管理，用来确保__dict__属性的存在
from threading import current_thread, RLock
__all__ = ["local"]

class _localimpl:  # local()._local__impl = _localimpl()  # local()实例的属性_local__impl就是这个类的实例
    """一个管理线程字典的类"""
    __slots__ = 'key', 'dicts', 'localargs', 'locallock', '__weakref__'  # _local__impl有这么多属性

    def __init__(self):
        # 这个self.key是用在线程对象的字典中的key
        # self.key使用的一个字符串，这样既能运行的快，
        # 但是通过'_threading_local._localimpl.' + str(id(self)也能保证不会冲突别的属性

        self.key = '_threading_local._localimpl.' + str(id(self))
        #
        self.dicts = {}  # 大字典
        # 格式是： { id(线程1)：(ref(Thread), 线程1自身的字典), id(线程2)：(ref(Thread), 线程2自身的字典), ... }

    def get_dict(self):  # 从大字典中拿(ref(Thread), 线程字典), 然后取线程字典
        thread = current_thread()
        return self.dicts[id(thread)][1]

    def create_dict(self):  # 为当前线程创建一个线程字典，就是(ref(Thread), 线程字典)[1]，即元组的第二部分
        localdict = {}
        key = self.key  # key使用'_threading_local._localimpl.' + str(id(self)
        thread = current_thread()  # 当前线程
        idt = id(thread)  # 当前线程的id
        def local_deleted(_, key=key):  # 这个函数不看  pass
            # When the localimpl is deleted, remove the thread attribute.
            thread = wrthread()
            if thread is not None:
                del thread.__dict__[key]
        def thread_deleted(_, idt=idt):  # 这个函数不看 pass
            # When the thread is deleted, remove the local dict.
            # Note that this is suboptimal if the thread object gets
            # caught in a reference loop. We would like to be called
            # as soon as the OS-level thread ends instead.
            local = wrlocal()
            if local is not None:
                dct = local.dicts.pop(idt)
        wrlocal = ref(self, local_deleted)
        wrthread = ref(thread, thread_deleted)  # 大字典中每一个线程对应的元素的第一个位置： （ref(Thread), 小字典）
        thread.__dict__[key] = wrlocal
        self.dicts[idt] = wrthread, localdict  # 在大字典中构造： id(thread) : （ref(Thread), 小字典）
        return localdict


@contextmanager
def _patch(self):
    impl = object.__getattribute__(self, '_local__impl')  # 此时的self是local(), 拿local()._local__impl
    try:
        dct = impl.get_dict()   # 然后从拿到的local()._local__impl调用线程字典管理类的local()._local__impl.get_dict()方法
                                # 从20行到22这个get_dict()方法的定义可以看出来，拿不到会报KeyError的

    except KeyError:  # 如果拿不到报 KeyError之后捕捉
        dct = impl.create_dict()  # 然后再通过线程字典管理类临时创建一个
        args, kw = impl.localargs  # 这个时候把拿到
        self.__init__(*args, **kw)
    with impl.locallock:  # 通过上下文的方式上锁
        object.__setattr__(self, '__dict__', dct)  # 给local() 实例增加__dict__属性，这个属性指向大字典中value元组的第二个元素，即线程小字典
        yield  # 到目前为止，local()类的两个属性都构造完成


class local:  # local类
    __slots__ = '_local__impl', '__dict__'  # local类有两个属性可以访问

    def __new__(cls, *args, **kw):
        if (args or kw) and (cls.__init__ is object.__init__):  # pass不看
            raise TypeError("Initialization arguments are not supported")
        self = object.__new__(cls)  # pass不看
        impl = _localimpl()  # _local_impl属性对应的是_localimpl类的实例
        impl.localargs = (args, kw)  # _local_impl属性即_localimpl类的实例 的 localargs属性是一个元组
        impl.locallock = RLock()  # pass 不看
        object.__setattr__(self, '_local__impl', impl)
        # 把_local__impl 增加给local()， 所以：local()._local__impl is ipml 即 _localimp()

        # __slots__规定了local()有两个属性，这里已经设置了一个_local__impl；
        # 第二个属性__dict__当我们以后在访问的时候使用上下文进行临时增加，比如第85行

        impl.create_dict()  # 就是local._local__impl.create_dict()
        return self  # 返回这个配置好_local__impl属性的local()实例

    def __getattribute__(self, name):  # 当我们取local()的属性时
        with _patch(self):  # 会通过上下文先把数据准备好
            return object.__getattribute__(self, name)  # 在准备好的数据中去拿要拿的属性name

    def __setattr__(self, name, value):
        if name == '__dict__':  # 这个判断语句是控制local()实例的__dict__属性只能读不能被替换
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):  # 同理， 通过上下文先把__dict__构造好
            return object.__setattr__(self, name, value)  # 然后调用基类的方法设置属性

    def __delattr__(self, name):  # 删除属性，同理，和__setattr__手法相似
        if name == '__dict__':   # 这个判断语句是控制local()实例的__dict__属性只能读不能被替换
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):  # 同理， 通过上下文先把__dict__构造好
            return object.__delattr__(self, name)

不再游移

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 彻底理解线程

Rlock是一个可以被同一个线程请求多次的同步指令.Rlock使用了"拥有的线程"和"递归等级"的概念,处于锁定状态时,Rlock被某个线程拥有.拥有Rlock的线程可以再次调用acquire(), 释放锁时需要调用release()相同次数.可以认为RLock包含一个锁定池和一个初始值为0的计数器，每次成功调用 acquire()/release()，计数器将+1/-1，为0。很多时候,线程之间会有互相通信的需要.常见的情形是次要线程为主要线程执行特地的任务,在执行过程中需要不断报告执行的进度情况.
复制链接

扫一扫