Python 多线程基础

最新推荐文章于 2024-03-28 14:05:33 发布

feiyy404

最新推荐文章于 2024-03-28 14:05:33 发布

阅读量229

点赞数

分类专栏：面试复习知识点

本文链接：https://blog.csdn.net/Enjolras_fuu/article/details/105160584

版权

面试复习知识点专栏收录该内容

3 篇文章 0 订阅

订阅专栏

官方参考文档

https://docs.python.org/zh-cn/3.7/library/threading.html#module-threading

Thread 直接创建子线程

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()

print("Main: end")

这里一共产生了三个线程，分别是主线程MainThread和两个子线程Thread-1、Thread-2。另外我们观察到，主线程首先运行结束，
Thread-1、Thread-2 才接连运行结束，分别间隔了 1 秒和 4 秒。这说明主线程并没有等待子线程运行完毕才结束运行，而是直接退出了，有点不符合常理。

规定主线程在子线程后退出

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 规定主线程在子线程后退出 
    thread_instance.join()

print("Main: end")

有关于 join

如果我们测试上一步的运行时间，可以发现不管是单独运行，还是多线程运行，join 的运行时间均是 10s 左右。
（10 = 1+2+3+4）似乎失去了多线程运行的意义，其实则是没有正确使用 join 的结果。

那么, join 真正的含义是什么呢？
join 会卡住主线程，并让当前已经 start 的子线程继续运行，直到调用.join的这个线程运行完毕。
所以，我们只需要 join 时间最长的一个线程即可。

import threading
import time


now = lambda :time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


t1 = now()
print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 可规定主线程在子线程后退出
    if i == 4:
        thread_instance.join()

print(f"Main: end, Time: {now() - t1}")

当然，这是在我们知道哪个线程先运行完，哪个线程后面运行完的情况下。
在我们不知道哪个线程先运行完成的情况下，在以后之后，需要对每一个进行 join。

我们设想这样一个场景。你的爬虫使用10个线程爬取100个 URL，主线程需要等到所有URL 都已经爬取完成以后，再来分析数据。此时就可以通过 join 先把主线程卡住，
等到10个子线程全部运行结束了，再用主线程进行后面的操作。
如果我不知道哪个线程先运行完，那个线程后运行完怎么办？这个时候就要每个线程都执行 join 操作了。
这种情况下，每个线程使用 join是合理的：

thread_list = []
for _ in range(10):
    thread = threading.Thread(target=xxx, args=(xxx, xxx)) 换行thread.start()
    thread_list.append(thread)

for thread in thread_list:
    thread.join()

通过继承的方式创建多线程

import threading
import time


class MyThread(threading.Thread):
    def __init__(self, interval):
        super(MyThread, self).__init__()
        self.interval = interval

    def run(self):
        name = threading.current_thread().name
        print(f"{name} start")
        time.sleep(self.interval)
        print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = MyThread(i)
    thread_instance.start()
    # 可规定主线程在子线程后退出
    # 可规定主线程在子线程后退出
    if i == 4:
        thread_instance.join() 
print("Main: end")

两种实现方式的效果是相同的。

守护线程

在线程中有一个叫作守护线程的概念，如果一个线程被设置为守护线程，那么意味着这个线程是“不重要”的，这意味着，如果主线程结束了而该守护线程还没有运行完，
那么它将会被强制结束。在 Python 中我们可以通过 setDaemon 方法来将某个线程设置为守护线程。

import threading
import time


now = lambda:time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


thread_1 = threading.Thread(target=work, args=(1, ))
thread_2 = threading.Thread(target=work, args=(5, ))
thread_2.setDaemon(True)
thread_1.start()
thread_2.start()

print("Main End.")

互斥锁

在一个进程中的多个线程是共享资源的，比如在一个进程中，有一个全局变量 count 用来计数，现在我们声明多个线程，每个线程运行时都给 count 加 1，
让我们来看看效果如何，代码实现如下:

import threading
import time

count = 0


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        temp = count + 1
        time.sleep(0.001)
        count = temp


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

那这样，按照常理来说，最终的 count 值应该为 1000。但其实不然，我们来运行一下看看。
运行结果如下：
Final count: 69

这是为什么呢？因为count这个值是共享的，每个线程都可以在执行temp=count这行代码时拿到当前count的值，但是这些线程中的一些线程可能是并发或者并行执行的，
这就导致不同的线程拿到的可能是同一个 count 值，最后导致有些线程的 count 的加 1 操作并没有生效，导致最后的结果偏小。

所以，如果多个线程同时对某个数据进行读取或修改，就会出现不可预料的结果。为了避免这种情况，我们需要对多个线程进行同步，要实现同步，
我们可以对需要操作的数据进行加锁保护，这里就需要用到threading.Lock 了。

加锁保护是什么意思呢？就是说，某个线程在对数据进行操作前，需要先加锁，这样其他的线程发现被加锁了之后，就无法继续向下执行，会一直等待锁被释放，
只有加锁的线程把锁释放了，其他的线程才能继续加锁并对数据做修改，修改完了再释放锁。这样可以确保同一时间只有一个线程操作数据，多个线程不会再同时读取和修改同一个数据，
这样最后的运行结果就是对的了。

import threading
import time

count = 0
lock = threading.Lock()


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        # 获取锁
        lock.acquire()
        temp = count + 1
        time.sleep(0.001)
        count = temp
        # 释放锁
        lock.release()


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

关于 Python 中的多线程

由于Python中GIL的限制，导致不论是在单核还是多核条件下，在同一时刻只能运行一个线程，导致Python多线程无法发挥多核并行的优势。
GIL全称为GlobalInterpreterLock，中文翻译为全局解释器锁，其最初设计是出于数据安全而考虑的。在Python多线程下，每个线程的执行方式如下：

获取 GIL
执行对应线程的代码
释放 GIL
可见，某个线程想要执行，必须先拿到GIL，我们可以把GIL看作是通行证，并且在一个Python进程中，GIL只有一个。拿不到通行证的线程，就不允许执行。
这样就会导致，即使是多核条件下，一个 Python 进程下的多个线程，同一时刻也只能执行一个线程。

不过对于爬虫这种 IO 密集型任务来说，这个问题影响并不大。而对于计算密集型任务来说，由于 GIL 的存在，多线程总体的运行效率相比可能反而比单线程更低。

feiyy404

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 多线程基础

官方参考文档https://docs.python.org/zh-cn/3.7/library/threading.html#module-threadingThread 直接创建子线程import threadingimport timedef work(internal): name = threading.current_thread().name print(...
复制链接

扫一扫