在网络编程中,有一些有的时候比较难处理的地方,尤其是在系统性能要求比较高的时候,
(1)类似于gethostbyname,getaddrinfo这些操作,不能阻塞
(2)connect方法也不能阻塞
不过一个优秀成熟的服务器系统都需要能够完美的解决这些问题。。。
gevnet应该算是吧,起码这些问题都得到比较好的解决。
在gevent中默认将会使用线程池来解决gethostbyname相关的额一些操作,刚开始学习gevent都会比较的困惑,为啥都用了reactor了,为啥还要线程池,
恩,主要目的就是解决一些epoll本身并不能解决的阻塞操作,例如gethostbyname
我们来看看这个Resolver是怎么定义的吧:
class Resolver(object):
expected_errors = Exception
def __init__(self, hub=None):
if hub is None:
hub = get_hub()
self.pool = hub.threadpool
def __repr__(self):
return '<gevent.resolver_thread.Resolver at 0x%x pool=%r>' % (id(self), self.pool)
def close(self):
pass
# from briefly reading socketmodule.c, it seems that all of the functions
# below are thread-safe in Python, even if they are not thread-safe in C.
def gethostbyname(self, *args):
return self.pool.apply_e(self.expected_errors, _socket.gethostbyname, args)
def gethostbyname_ex(self, *args):
return self.pool.apply_e(self.expected_errors, _socket.gethostbyname_ex, args)
def getaddrinfo(self, *args, **kwargs):
return self.pool.apply_e(self.expected_errors, _socket.getaddrinfo, args, kwargs)
def gethostbyaddr(self, *args, **kwargs):
return self.pool.apply_e(self.expected_errors, _socket.gethostbyaddr, args, kwargs)
def getnameinfo(self, *args, **kwargs):
return self.pool.apply_e(self.expected_errors, _socket.getnameinfo, args, kwargs)
代码都很少,其实就将这些操作本身直接派发到线程池中去运行,这样就只会阻塞发起这些方法本身的协程了,而不会阻塞整个系统。
那么接下来问题就到了gevent是如何实现协程与python线程之间的协作的,主要问题在于:
业务协程如何提交任务到线程池,然后协程挂起,等待任务在线程池中执行完毕再恢复协程的执行?
线程池的代码在threadpool.py文件,本身的定义也跟标准的线程池一样,都是一个任务对咧,然后多个工作线程,我们来看看一般是怎么调度任务到线程池中运行的吧:
# XXX apply() should re-raise error by default
# XXX because that's what builtin apply does
# XXX check gevent.pool.Pool.apply and multiprocessing.Pool.apply
def apply_e(self, expected_errors, function, args=None, kwargs=None):
"""
任务的执行将会挂起当前的协程,当任务执行完毕之后当前协程才会被调度
"""
if args is None:
args = ()
if kwargs is None:
kwargs = {}
# 将任务放到任务队列里面,然后等待并返回结果
success, result = self.spawn(wrap_errors, expected_errors, function, args, kwargs).get()
if success:
return result
raise result
def apply(self, func, args=None, kwds=None):
"""Equivalent of the apply() builtin function. It blocks till the result is ready."""
if args is None:
args = ()
if kwds is None:
kwds = {}
return self.spawn(func, *args, **kwds).get()
主要就在于spawn方法的实现:
def spawn(self, func, *args, **kwargs):
"""
将这个执行封装成一个任务,放到任务队列里面去,返回ThreadResult对象
"""
while True:
semaphore = self._semaphore
semaphore.acquire() # 获取当前的锁
if semaphore is self._semaphore:
break
try:
task_queue = self.task_queue # 获取工作队列
result = AsyncResult() # 创建异步结果
thread_result = ThreadResult(result, hub=self.hub) # 创建线程结果来包装异步结果
task_queue.put((func, args, kwargs, thread_result)) # 将当前任务放到任务队列里面
self.adjust() # 调整线程池大小
# rawlink() must be the last call
result.rawlink(lambda *args: self._semaphore.release()) # 在异步结果上面挂起回调
# XXX this _semaphore.release() is competing for order with get()
# XXX this is not good, just make ThreadResult release the semaphore before doing anything else
except:
semaphore.release() # 释放信号量
raise
return result
这里可以看到,其实构建了ThreadResult来进行当前协程与当前线程之间的协作,然后通过AsyncResult来进行当前协程的调度,那么实现的关键就在于 ThreadResult的实现了,它是如何实现线程间的协作的:
class ThreadResult(object):
def __init__(self, receiver, hub=None):
if hub is None:
hub = get_hub()
self.receiver = receiver # AsyncResult
self.hub = hub
self.value = None
self.context = None
self.exc_info = None
self.async = hub.loop.async() # 在loop上面创建async对象
self.async.start(self._on_async) # 启动这个监听对象
def _on_async(self):
"""
在另外的线程任务完成之后,会调用async的send,用于通知
也就是执行这里的回调
"""
self.async.stop()
try:
if self.exc_info is not None:
try:
self.hub.handle_error(self.context, *self.exc_info)
finally:
self.exc_info = None
self.context = None
self.async = None
self.hub = None
if self.receiver is not None:
# XXX exception!!!?
self.receiver(self) # 相当于是通知AsyncResult对象
finally:
self.receiver = None
self.value = None
def set(self, value):
"""
在另外一个线程中,如果执行完了,那么将会调用这个方法来通知async
"""
self.value = value
self.=.send()
def handle_error(self, context, exc_info):
self.context = context
self.exc_info = exc_info
self.async.send()
# link protocol:
def successful(self):
return True
这里,就基本上都清除了,关键在于async对象,其实就是libev的ev_async对象,通过这种类型的watcher来实现python的线程与协程之间的协作的。
这样就算理顺了gevent线程池这部分的实现逻辑。