Tornado中的IOStream封装了socket的非阻塞IO的读写操作,我个人觉得比较有意思的是read_util()接口:设置一个标志字符串和回调函数,其余的工作都可以省略了,当IOStream读到标志字符串时自动调用该回调函数,整个接口很人性化,简洁方便。
属性:
self.sockt: 封装的套接字,nonblocking模式;
self._read_buffer: 读缓存器,collections.deque类型, self._write_buffer类似
self.io_loop: 事件驱动模型,因为需要添加/修改 f侦听d的读写event
self._state: 事件驱动模型侦听socket的event(读/写/错误)
self._read_callback: 读到指定字节数据时,或是指定标志字符串时,需要执行的回调函数
self._write_callback: 发送完_write_buffer的数据时,需要执行的回调函数
self._connect_callback: 此时self._socket(nonblocking)是客户端,正在向服务端发送请求,如果成功建立连接,需要执行的回调函数
self._connecting: 此时self._socket(nonblocking)是客户端, 正在等待连接的建立
对外接口:
1、构造函数: 初始化iostream实例的各个属性,并把socket的ERROR event 加入ioloop(epoll)中监听,对应的handler是self._handle_events.
def __init__(self, socket, io_loop=None, max_buffer_size=104857600, read_chunk_size=4096):
....
self.io_loop.add_handler(
self.socket.fileno(), self._handle_events, self._state)
2、connect函数:此时iostream用在客户端。注意connect返回时不一定表示连接已建立,因为socket是nonblocking。
def connect(self, address, callback=None):
self._connecting = True
try:
self.socket.connect(address)
except socket.error, e:
if e.args[0] not in (errno.EINPROGRESS, errno.EWOULDBLOCK):
raise
self._connect_callback = stack_context.wrap(callback)
self._add_io_state(self.io_loop.WRITE)
3、read_util 函数:它的主要作用就是设置标志字符串,以及对应的回调函数。顺便扫描一下当前缓冲区中是否有标志字符串,并尝试从socket中读取新数据。_handle_events中有详细地分析。
def read_until(self, delimiter, callback):
assert not self._read_callback, "Already reading"
self._read_delimiter = delimiter
self._read_callback = stack_context.wrap(callback)
while True:
# See if we've already got the data from a previous read
if self._read_from_buffer():
return
self._check_closed()
# hp: 继续把数据读入_read_buffer中 == 0,表示 EAGAIN errno or closed
if self._read_to_buffer() == 0:
break # hp: callback 什么时候才会执行呢, handle_events
# hp: 继续监听socket的Read事件,调用ioloop.update_hander()
# hp: self.socket的read_handler == self_handle_events
self._add_io_state(self.io_loop.READ)
4、write函数类似read_util,主要的作用也是设置需要发送的数据,以及数据发送完之后需要执行的回调函数。
def write(self, data, callback=None):
self._check_closed()
self._write_buffer.append(data)
self._add_io_state(self.io_loop.WRITE) # hp: 开始监听socket的写事件
self._write_callback = stack_context.wrap(callback)
5、其他的reading writing closed 都很简单
我们分析iostream的重点:self.socket的读写处理函数self._handle_events
def _handle_events(self, fd, events):
...
try:
if events & self.io_loop.READ:
self._handle_read()
if not self.socket:
return
if events & self.io_loop.WRITE:
if self._connecting: # hp: 和服务端建立连接
self._handle_connect() # hp: 里面调用connect设置好的钩子self._connect_callback
self._handle_write() # hp: 把_write_buffer里面的数据写入socket中,如果全部写完,执行回调函数
if not self.socket:
return
if events & self.io_loop.ERROR: # hp: epoll出现错误,直接关闭连接
self.close()
return
# hp: 更新epoll的监听
state = self.io_loop.ERROR
if self.reading(): # self._read_callback is not None
state |= self.io_loop.READ
if self.writing():
state |= self.io_loop.WRITE
if state != self._state:
self._state = state
self.io_loop.update_handler(self.socket.fileno(), self._state)
except:
logging.error("Uncaught exception, closing connection.",
exc_info=True)
self.close()
raise
当socket可读时,直接调用self._handle_read()处理读事件,先调用_read_to_buffer把协议栈准备好的数据读入_read_buffer缓存中,接着调用_read_from_buffer分析缓存中的数据,看看是否满足 read_util/ read_bytes设定的条件。
def _handle_read(self):
while True:
try:
# Read from the socket until we get EWOULDBLOCK or equivalent.
# SSL sockets do some internal buffering, and if the data is
# sitting in the SSL object's buffer select() and friends
# can't see it; the only way to find out if it's there is to
# try to read it.
result = self._read_to_buffer()
except Exception: # hp: 出现异常(EWOULDBLOCK/EAGAIN不算)
self.close()
return
if result == 0: # hp: closed or EAGAIN
break
else:
# hp: 里面会判断是否有self._read_delimiter/self._read_callback
if self._read_from_buffer():
return
由于socket就绪可读,_read_from_socket里面直接调用recv()循环读取数据,直到出现EAGAIN/EWOULDBLOCK错误或是连接已经关闭。
def _read_to_buffer(self):
try:
chunk = self._read_from_socket() #hp: 里面就是调用recv()
except socket.error, e:
# ssl.SSLError is a subclass of socket.error
logging.warning("Read error on %d: %s",
self.socket.fileno(), e)
self.close()
raise
if chunk is None: # hp: EAGAIN errno or peer has closed
return 0
self._read_buffer.append(chunk)
if self._read_buffer_size() >= self.max_buffer_size:
logging.error("Reached maximum read buffer size")
self.close()
raise IOError("Reached maximum read buffer size")
return len(chunk)
_read_from_buffer就是对缓存数据进行分析,看看是否满足 read_util/ read_bytes设定的条件,如果满足就执行相应的回调函数。
def _read_from_buffer(self):
if self._read_bytes:
...
elif self._read_delimiter: # hp: read_util设置
# hp: _read_buffer太长了吧,2^32,高并发时不怕把内存挤爆!!!
# hp: 相当于所有的数据都放在_read_buffer[0]上了
_merge_prefix(self._read_buffer, sys.maxint)
loc = self._read_buffer[0].find(self._read_delimiter)
if loc != -1:
callback = self._read_callback
delimiter_len = len(self._read_delimiter)
self._read_callback = None
self._read_delimiter = None
self._run_callback(callback,
# hp: _consume返回的是delimiter之前包括delimiter的数据
self._consume(loc + delimiter_len))
return True
return False
其中_merge_prefix(deque, size)函数比较有意思,它把deque中的前size个字节数据放到deque的第一个位置上。但我个人觉得这样子会频繁地分配和释放内存,比较影响性能,好处就是太方便了,不用自己手写一个缓存处理结构。
总的来说,IOStream使用deque容器和_merge_prefix(deque, size)函数完成了非阻塞IO读取数据的缓存和重组,再结合read_util函数,很好的把非阻塞IO分片数据的处理封装在IOStream类里面,上层很方便地处理socket的读写操作。