《Python高级编程》学习心得——第十一章 多线程和多进程
Python GIL:全局解释器锁
准确来说,GIL (全局解释器锁) 并不是Python的语法特性而是基于C语言的Python解释器CPython的规定,Python的其他解释器例如基于Java的Jython和基于Python的PyPy是没有全局解释器锁的。但是目前CPython是Python语言的主流解释器 (我们从Python官网上下载的解释器就是CPython) ,大部分第三方库也是基于CPython开发的,所以GIL的问题几乎就是Python语言本身的问题。
GIL在Python wiki的解释如下:
In CPython, the global interpreter lock*, or GIL, is a mutex that protects access to* Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainlybecause CPython’s memory management is not thread-safe.
翻译成中文,意思是说,“GIL是一个互斥锁,阻止多个线程同时执行一段Python字节码”。GIL保证了Python字节码是线程安全的,但同时,GIL使得Python不可能利用多核并行实现多线程。因为即使是多核,GIL仍然使得CPU一个时间只能运行一段字节码。这也是Python多线程效率不高、为人诟病的原因。
那么,是否Python的多线程就没有意义了呢?实则不然。操作系统课上都讲过,进程/线程可以分为两种,一种是CPU Bound (“计算密集型”,例如视频解码,科学计算,深度学习训练与预测<本质上是一堆矩阵乘法,也可以归为科学计算范畴>),一种是IO Bound (“IO密集型”,例如基于网络的、数据库应用)。IO Bound类型的进程/线程中CPU计算时间只占一小部分,大部分时间在处理IO,与CPU无关,此时CPU可以腾出手来处理其他进程/线程。因此,即使Python的GIL使得CPU不能并行计算多个进程,当存在多个IO Bound的进程/线程时,多进程/多线程仍能通过CPU和IO的切换提高效率。而在Python常见的服务器端开发和爬虫开发中,我们的应用大部分都是IO Bound (服务器端的访问数据库、处理请求和爬虫的发起请求)的,可以用多线程优化,因此掌握Python多线程还是很重要的。
多线程相关Python源码
本节不打算逐个介绍Python多线程编程和多进程编程的每个知识点,主要解读两个包 (threading.py和Queue.py) 中的3个类Condition, Semaphore和Queue来理解Python多线程。
threading.Condition && threading.Semaphore
有操作系统中进程调度的基础很容易理解Condition和Semaphore两个类。这里就不详细解释了,源码里的doc已经在类和关键的方法下面写了很详细的注释了。
值得一提的是Condition类实现了__enter__
和__exit__
方法,因而实现了上下文管理协议,是一个上下文管理器,可以用with
…as
…语法。
class Condition:
"""Class that implements a condition variable.
A condition variable allows one or more threads to wait until they are
notified by another thread.
If the lock argument is given and not None, it must be a Lock or RLock
object, and it is used as the underlying lock. Otherwise, a new RLock object
is created and used as the underlying lock.
"""
def __init__(self, lock=None):
if lock is None:
lock = RLock()
self._lock = lock
# Export the lock's acquire() and release() methods
self.acquire = lock.acquire
self.release = lock.release
# If the lock defines _release_save() and/or _acquire_restore(),
# these override the default implementations (which just call
# release() and acquire() on the lock). Ditto for _is_owned().
try:
self._release_save = lock._release_save
except AttributeError:
pass
try:
self._acquire_restore = lock._acquire_restore
except AttributeError:
pass
try:
self._is_owned = lock._is_owned
except AttributeError:
pass
self._waiters = _deque()
def __enter__(self):
return self._lock.__enter__()
def __exit__(self, *args):
return self._lock.__exit__(*args)
def __repr__(self):
return "<Condition(%s, %d)>" % (self._lock, len(self._waiters))
def _release_save(self):
self._lock.release() # No state to save
def _acquire_restore(self, x):
self._lock.acquire() # Ignore saved state
def _is_owned(self):
# Return True if lock is owned by current_thread.
# This method is called only if _lock doesn't have _is_owned().
if self._lock.acquire(0):
self._lock.release()
return False
else:
return True
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout f