转载出自:http://www.rainsts.net/article.asp?id=1008
在 Linux 下凡是和 epoll 沾边的基本都是好东西,Tornado 也算一个新星。
Tornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like web.py or Google's webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking infrastructure. The framework is distinct from most mainstream web server frameworks (and certainly most Python frameworks) because it is non-blocking and reasonably fast. Because it is non-blocking and uses epoll, it can handle thousands of simultaneous standing connections, which means it is ideal for real-time web services. We built the web server specifically to handle FriendFeed's real-time features — every active user of FriendFeed maintains an open connection to the FriendFeed servers. (For more information on scaling servers to support thousands of clients, see The C10K problem.)
我们用一个简单的示例分析其基本的执行流程。
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import os
from tornado.httpserver import HTTPServer
from tornado.web import Application, RequestHandler
from tornado.ioloop import IOLoop
class TestHandler(RequestHandler):
def get(self):
self.write("Hello, World!\n")
settings = {
"static_path" : os.path.join(os.path.dirname(__file__), "static"),
}
application = Application([
(r"/", TestHandler),
], **settings)
if __name__ == "__main__":
server = HTTPServer(application)
server.listen(8000)
IOLoop.instance().start()
代码测试无误后,添加 pdb.set_trace() 开始获取 "调用堆栈"。
class TestHandler(RequestHandler):
def get(self):
import pdb
pdb.set_trace()
self.write("Hello, World!\n")
执行,并使用 curl 或 wget 请求 "curl http://localhost:8000" 触发断点。
$ ./test.py
> /home/yuhen/projects/python/tornado-test/test.py(13)get()
-> self.write("Hello, World!\n")
(Pdb) w
/home/yuhen/projects/python/tornado-test/test.py(26)<module>()
-> IOLoop.instance().start()
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/ioloop.py(245)start()
-> self._handlers[fd](fd, events)
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(143)_handle_events()
-> self._handle_read()
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(195)_handle_read()
-> callback(self._consume(loc + delimiter_len))
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)
/usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(740)_execute()
-> getattr(self, self.request.method.lower())(*args, **kwargs)
> /home/yuhen/projects/python/tornado-test/test.py(13)get()
-> self.write("Hello, World!\n")
很好,拿到这堆信息就好办。使用 u 和 d 命令,我们可以在栈帧之间切换,用以获取其相关上下文变量信息。
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(740)_execute()
-> getattr(self, self.request.method.lower())(*args, **kwargs)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(195)_handle_read()
-> callback(self._consume(loc + delimiter_len))
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(143)_handle_events()
-> self._handle_read()
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/ioloop.py(245)start()
-> self._handlers[fd](fd, events)
(Pdb) u
> /home/yuhen/projects/python/tornado-test/test.py(26)<module>()
-> IOLoop.instance().start()
(Pdb) l
21 ], **settings)
22
23 if __name__ == "__main__":
24 server = HTTPServer(application)
25 server.listen(8000)
26 -> IOLoop.instance().start()
27
[EOF]
1. Ready?
很显然在查看 IOLoop 前,我们不能忽略 Applicaiton 和 HTTPServer 的基本初始化操作。
class Application(object):
def __init__(self, handlers=None, default_host="", transforms=None, wsgi=False, **settings):
...
self.handlers = []
self.named_handlers = {}
self.default_host = default_host
self.settings = settings
...
if self.settings.get("static_path"):
path = self.settings["static_path"]
handlers = list(handlers or [])
static_url_prefix = settings.get("static_url_prefix", "/static/")
handlers = [
(re.escape(static_url_prefix) + r"(.*)", StaticFileHandler, dict(path=path)),
(r"/(favicon\.ico)", StaticFileHandler, dict(path=path)),
(r"/(robots\.txt)", StaticFileHandler, dict(path=path)),
] + handlers
if handlers: self.add_handlers(".*{1}quot;, handlers)
...
无需关注细节(后面另文分析),Application.__init__ 最重要工作的是完成了 URL 路由的设置,通过调用 add_handlers 将我们设置的 "(r"/", TestHandler)",以及静态文件 "/static/" 都添加到 handlers 里。
我们继续看看 HTTPServer 都做了些什么。
class HTTPServer(object):
def __init__(self, request_callback, no_keep_alive=False, io_loop=None, ...):
self.request_callback = request_callback
self.io_loop = io_loop
self._socket = None
def listen(self, port, address=""):
self.bind(port, address)
self.start(1)
def start(self, num_processes=None):
self._started = True
...
if num_processes > 1:
for i in range(num_processes):
if os.fork() == 0:
self.io_loop = ioloop.IOLoop.instance()
self.io_loop.add_handler(self._socket.fileno(), self._handle_events, ioloop.IOLoop.READ)
return
os.waitpid(-1, 0)
else:
if not self.io_loop:
self.io_loop = ioloop.IOLoop.instance()
self.io_loop.add_handler(self._socket.fileno(), self._handle_events, ioloop.IOLoop.READ)
不算复杂,通过 HTTPServer.listen() 调用 HTTPServer.bind() 获取监听 Socket.fileno。在 HTTPServer.start() 中,根据 cpu 核的数量 fork 多个子进程进行处理,以提高性能。IOLoop 会调用 HTTPServer._handle_events() 处理客户端接入事件。
IOLoop.instance() 是一个 singleton 模式,IOLoop.start() 则是开始 I/O Loop,这个熟悉 epoll 的都已知晓,无需多言。需要注意的是,HTTPServer.request_callbak 实际上是 Application 对象,只是这个命名方式有点古怪。
当我们访问该 WebServer 时,epoll 触发相关事件,并调用事先注册的 HTTPServer._handle_events()。
class HTTPServer(object):
def _handle_events(self, fd, events):
while True:
connection, address = self._socket.accept()
stream = iostream.IOStream(connection, io_loop=self.io_loop)
HTTPConnection(stream, address, self.request_callback, self.no_keep_alive, self.xheaders)
IOStream 接受了 client socket 也就是 connection 作为参数,其内部实际上完成了客户端数据处理事件注册。
class IOStream(object):
def __init__(self, socket, io_loop=None, max_buffer_size=104857600, read_chunk_size=4096):
self.socket = socket
self.socket.setblocking(False)
self.io_loop = io_loop or ioloop.IOLoop.instance()
...
self._read_callback = None
...
self.io_loop.add_handler(self.socket.fileno(), self._handle_events, self._state)
def _handle_events(self, fd, events):
if events & self.io_loop.READ:
self._handle_read()
...
def _handle_read(self):
chunk = self.socket.recv(self.read_chunk_size)
...
if self._read_bytes:
if len(self._read_buffer) >= self._read_bytes:
callback = self._read_callback
self._read_callback = None
self._read_bytes = None
callback(self._consume(num_bytes))
...
一旦接收到该客户端发送的数据,IOStream._handle_read() 被调用,并通过 self._read_callback 完成后续处理,只不过到目前为止这个 callback 貌似是 None。在HTTPServer._handle_event 中创建 IOStream 后,继续实例化了一个 HTTPConnection 对象。
class HTTPConnection(object):
def __init__(self, stream, address, request_callback, no_keep_alive=False, xheaders=False):
self.stream = stream
self.request_callback = request_callback
...
self.stream.read_until("\r\n\r\n", self._on_headers)
class IOStream(object):
def read_until(self, delimiter, callback):
...
self._read_delimiter = delimiter
self._read_callback = callback
self._add_io_state(self.io_loop.READ)
HTTPConnection 将 _on_headers() 作为 callback 传递给 IOStream,如此前面的 _read_callback 总算有着落了。至此,我们完成了服务器监听,以及接入客户端,接收客户端发送数据的全部准备工作。我们的跟踪进度暂停在 HTTPConnection._on_headers(),这也符合我们前面所获取的调用堆栈列表。
2. Go!
好了,现在我们应该继续下一步工作,就是对 HTTP 请求如何进行处理。
class HTTPConnection(object):
def _on_headers(self, data):
...
headers = HTTPHeaders.parse(data[eol:])
self._request = HTTPRequest(
connection=self, method=method, uri=uri, version=version,
headers=headers, remote_ip=self.address[0])
...
self.request_callback(self._request)
关键就出在这个 request_callback 身上。
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)
(Pdb) l
289 if headers.get("Expect") == "100-continue":
290 self.stream.write("HTTP/1.1 100 (Continue)\r\n\r\n")
291 self.stream.read_bytes(content_length, self._on_request_body)
292 return
293
294 -> self.request_callback(self._request)
295
296 def _on_request_body(self, data):
297 self._request.body = data
298 content_type = self._request.headers.get("Content-Type", "")
299 if self._request.method == "POST":
(Pdb) self.request_callback
<tornado.web.Application object at 0x7f3d1d08fa50>
其实就是我们一路传递过来的 Application 对象,先是作为 HTTPServer.request_callback,然后在 HTTPServer._handle_events() 创建 HTTPConnection 对象时,又作为参数被传递过来。
Applicaiton 是一个对象,执行 Application(HTTPRequest) 会发生什么?很显然是 Application.__call__ 被勾搭出来。
class Application(object):
def __call__(self, request):
...
handlers = self._get_host_handlers(request)
if not handlers:
handler = RedirectHandler(
request, "http://" + self.default_host + "/")
else:
for spec in handlers:
match = spec.regex.match(request.path)
if match:
handler = spec.handler_class(self, request, **spec.kwargs)
kwargs = match.groupdict()
if kwargs:
args = []
else:
args = match.groups()
break
...
handler._execute(transforms, *args, **kwargs)
return handler
很简单,以 Request.Path 为条件匹配,找到我们最开始注册的 handler —— TestHandler,并通过调用 handler_class() 创建对象实例。
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)
(Pdb) l
1049 if getattr(RequestHandler, "_templates", None):
1050 map(lambda loader: loader.reset(),
1051 RequestHandler._templates.values())
1052 RequestHandler._static_hashes = {}
1053
1054 -> handler._execute(transforms, *args, **kwargs)
1055 return handler
1056
1057 def reverse_url(self, name, *args):
1058 """Returns a URL path for handler named `name`
1059
(Pdb) handler
<__main__.TestHandler object at 0x7fd59598af50>
总算是柳暗花明,看到了 TestHandler。
class RequestHandler(object):
def _execute(self, transforms, *args, **kwargs):
...
if not self._finished:
getattr(self, self.request.method.lower())(*args, **kwargs)
if self._auto_finish and not self._finished:
self.finish()
...
这个要是看不懂就要打PP了,通过 request.method,也就是 "POST, GET, DELETE ..." 之类的,在 RequestHandler 中查找同名的方法并进行调用,在本次调试中 TestHandler.get() 被执行。如此我们就完成了一个流程的单向跟踪。
class RequestHandler(object):
def finish(self, chunk=None):
...
if not self.application._wsgi:
self.flush(include_footers=True)
self.request.finish()
self._log()
3. Back!
在 TestHandler.get() 中我们完成了相关的数据准备,接下来数据如何返回给客户端呢?回到调用 get() 的 RequestHandler._execute()。
class RequestHandler(object):
def _execute(self, transforms, *args, **kwargs):
...
if not self._finished:
getattr(self, self.request.method.lower())(*args, **kwargs)
if self._auto_finish and not self._finished:
self.finish()
...
很显然这个 RequestHandler.finish() 是个重要的线索。
class RequestHandler(object):
def finish(self, chunk=None):
...
if not self.application._wsgi:
self.flush(include_footers=True)
self.request.finish()
self._log()
...
def flush(self, include_footers=False):
...
chunk = "".join(self._write_buffer)
...
if headers or chunk:
self.request.write(headers + chunk)
class HTTPRequest(object):
def write(self, chunk):
...
if not self.stream.closed():
self.stream.write(chunk, self._on_write_complete)
RequestHandler.flush() 将我们在 TestHandler.get() 中调用 self.write() 写入的数据合并写入 IOStream。
class IOStream(object):
def write(self, data, callback=None):
self._check_closed()
self._write_buffer += data
self._add_io_state(self.io_loop.WRITE)
self._write_callback = callback
def _handle_events(self, fd, events):
...
if events & self.io_loop.WRITE:
self._handle_write()
def _handle_write(self):
while self._write_buffer:
....
num_bytes = self.socket.send(self._write_buffer)
self._write_buffer = self._write_buffer[num_bytes:]
...
IOStream.write 中将要返回的数据附加到 self._write_buffer,并添加 IO_loop WRITE 事件监控。最终会触发 IOStream._handle_write(),将结果 send 给客户端 socket。
完成这些,HTTPRequest.finish() 会调用 HttpRequest.finish(),最终通过 IOStream.close() 移除事件,并关闭客户端 Socket 连接。
class HTTPRequest(object):
def finish(self):
self.connection.finish()
self._finish_time = time.time()
class HTTPConnection(object):
def finish(self):
...
if not self.stream.writing():
self._finish_request()
def _finish_request(self):
...
if disconnect:
self.stream.close()
return
class IOStream(object):
def close(self):
if self.socket is not None:
self.io_loop.remove_handler(self.socket.fileno())
self.socket.close()
self.socket = None
if self._close_callback: self._close_callback()
事情到这基本就结束了,对 Python 的逆向分析和 C、C# 并没有什么差别。合理使用 pdb 观察不同栈帧的环境变量和上下文,很容易完成跟踪工作。