Tornado: 1. 流程分析

转载出自:http://www.rainsts.net/article.asp?id=1008


在 Linux 下凡是和 epoll 沾边的基本都是好东西,Tornado 也算一个新星。

Tornado  is an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like web.py  or Google's webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking infrastructure.

The framework is distinct from most mainstream web server frameworks (and certainly most Python frameworks) because it is non-blocking and reasonably fast. Because it is non-blocking and uses epoll, it can handle thousands of simultaneous standing connections, which means it is ideal for real-time web services. We built the web server specifically to handle FriendFeed's real-time features — every active user of FriendFeed maintains an open connection to the FriendFeed servers. (For more information on scaling servers to support thousands of clients, see The C10K problem.)

我们用一个简单的示例分析其基本的执行流程。

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import os
from tornado.httpserver import HTTPServer
from tornado.web import Application, RequestHandler
from tornado.ioloop import IOLoop

class TestHandler(RequestHandler):
    def get(self):
        self.write("Hello, World!\n")

settings = {
    "static_path" : os.path.join(os.path.dirname(__file__), "static"),
}

application = Application([
    (r"/", TestHandler),
], **settings)

if __name__ == "__main__":
    server = HTTPServer(application)
    server.listen(8000)
    IOLoop.instance().start()

代码测试无误后,添加 pdb.set_trace() 开始获取 "调用堆栈"。

class TestHandler(RequestHandler):
    def get(self):
        import pdb
        pdb.set_trace()
        self.write("Hello, World!\n")

执行,并使用 curl 或 wget 请求 "curl http://localhost:8000" 触发断点。

$ ./test.py

> /home/yuhen/projects/python/tornado-test/test.py(13)get()
-> self.write("Hello, World!\n")
(Pdb) w
  /home/yuhen/projects/python/tornado-test/test.py(26)<module>()
-> IOLoop.instance().start()
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/ioloop.py(245)start()
-> self._handlers[fd](fd, events)
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(143)_handle_events()
-> self._handle_read()
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(195)_handle_read()
-> callback(self._consume(loc + delimiter_len))
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)
  /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(740)_execute()
-> getattr(self, self.request.method.lower())(*args, **kwargs)
> /home/yuhen/projects/python/tornado-test/test.py(13)get()
-> self.write("Hello, World!\n")

很好,拿到这堆信息就好办。使用 u 和 d 命令,我们可以在栈帧之间切换,用以获取其相关上下文变量信息。

(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(740)_execute()
-> getattr(self, self.request.method.lower())(*args, **kwargs)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(195)_handle_read()
-> callback(self._consume(loc + delimiter_len))
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/iostream.py(143)_handle_events()
-> self._handle_read()
(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/ioloop.py(245)start()
-> self._handlers[fd](fd, events)
(Pdb) u
> /home/yuhen/projects/python/tornado-test/test.py(26)<module>()
-> IOLoop.instance().start()
(Pdb) l
 21     ], **settings)
 22
 23     if __name__ == "__main__":
 24         server = HTTPServer(application)
 25         server.listen(8000)
 26  ->     IOLoop.instance().start()
 27
[EOF]
1. Ready?

很显然在查看 IOLoop 前,我们不能忽略 Applicaiton 和 HTTPServer 的基本初始化操作。

class Application(object):
    def __init__(self, handlers=None, default_host="", transforms=None, wsgi=False, **settings):

        ...
        self.handlers = []
        self.named_handlers = {}
        self.default_host = default_host
        self.settings = settings
        ...

        if self.settings.get("static_path"):
            path = self.settings["static_path"]
            handlers = list(handlers or [])
            static_url_prefix = settings.get("static_url_prefix", "/static/")
            handlers = [
                (re.escape(static_url_prefix) + r"(.*)", StaticFileHandler, dict(path=path)),
                (r"/(favicon\.ico)", StaticFileHandler, dict(path=path)),
                (r"/(robots\.txt)", StaticFileHandler, dict(path=path)),
            ] + handlers

        if handlers: self.add_handlers(".*{1}quot;, handlers)

        ...

无需关注细节(后面另文分析),Application.__init__ 最重要工作的是完成了 URL 路由的设置,通过调用 add_handlers 将我们设置的 "(r"/", TestHandler)",以及静态文件 "/static/" 都添加到 handlers 里。

我们继续看看 HTTPServer 都做了些什么。

class HTTPServer(object):
    def __init__(self, request_callback, no_keep_alive=False, io_loop=None, ...):
        self.request_callback = request_callback
        self.io_loop = io_loop
        self._socket = None

    def listen(self, port, address=""):
        self.bind(port, address)
        self.start(1)

    def start(self, num_processes=None):
        self._started = True
        
        ...

        if num_processes > 1:
            for i in range(num_processes):
                if os.fork() == 0:
                    self.io_loop = ioloop.IOLoop.instance()
                    self.io_loop.add_handler(self._socket.fileno(), self._handle_events, ioloop.IOLoop.READ)
                    return
            os.waitpid(-1, 0)
        else:
            if not self.io_loop:
                self.io_loop = ioloop.IOLoop.instance()
                self.io_loop.add_handler(self._socket.fileno(), self._handle_events, ioloop.IOLoop.READ)
不算复杂,通过 HTTPServer.listen() 调用 HTTPServer.bind() 获取监听 Socket.fileno。在 HTTPServer.start() 中,根据 cpu 核的数量 fork 多个子进程进行处理,以提高性能。IOLoop 会调用 HTTPServer._handle_events() 处理客户端接入事件。

IOLoop.instance() 是一个 singleton 模式,IOLoop.start() 则是开始 I/O Loop,这个熟悉 epoll 的都已知晓,无需多言。需要注意的是,HTTPServer.request_callbak 实际上是 Application 对象,只是这个命名方式有点古怪。

当我们访问该 WebServer 时,epoll 触发相关事件,并调用事先注册的 HTTPServer._handle_events()。

class HTTPServer(object):
    def _handle_events(self, fd, events):
        while True:
            connection, address = self._socket.accept()

            stream = iostream.IOStream(connection, io_loop=self.io_loop)
            HTTPConnection(stream, address, self.request_callback, self.no_keep_alive, self.xheaders)

IOStream 接受了 client socket 也就是 connection 作为参数,其内部实际上完成了客户端数据处理事件注册。

class IOStream(object):
    def __init__(self, socket, io_loop=None, max_buffer_size=104857600, read_chunk_size=4096):
        self.socket = socket
        self.socket.setblocking(False)
        self.io_loop = io_loop or ioloop.IOLoop.instance()
        ...
        self._read_callback = None
        ...
        self.io_loop.add_handler(self.socket.fileno(), self._handle_events, self._state)

    def _handle_events(self, fd, events):
        if events & self.io_loop.READ:
            self._handle_read()
            ...

    def _handle_read(self):
        chunk = self.socket.recv(self.read_chunk_size)
        ...
        if self._read_bytes:
            if len(self._read_buffer) >= self._read_bytes:
                callback = self._read_callback
                self._read_callback = None
                self._read_bytes = None
                callback(self._consume(num_bytes))
        ...

一旦接收到该客户端发送的数据,IOStream._handle_read() 被调用,并通过 self._read_callback 完成后续处理,只不过到目前为止这个 callback 貌似是 None。在HTTPServer._handle_event 中创建 IOStream 后,继续实例化了一个 HTTPConnection 对象。

class HTTPConnection(object):
    def __init__(self, stream, address, request_callback, no_keep_alive=False, xheaders=False):
        self.stream = stream
        self.request_callback = request_callback
        ...
        self.stream.read_until("\r\n\r\n", self._on_headers)

class IOStream(object):
    def read_until(self, delimiter, callback):
        ...
        self._read_delimiter = delimiter
        self._read_callback = callback
        self._add_io_state(self.io_loop.READ)


HTTPConnection 将 _on_headers() 作为 callback 传递给 IOStream,如此前面的 _read_callback 总算有着落了。至此,我们完成了服务器监听,以及接入客户端,接收客户端发送数据的全部准备工作。我们的跟踪进度暂停在 HTTPConnection._on_headers(),这也符合我们前面所获取的调用堆栈列表。

2. Go!

好了,现在我们应该继续下一步工作,就是对 HTTP 请求如何进行处理。

class HTTPConnection(object):
    def _on_headers(self, data):
        ...
        headers = HTTPHeaders.parse(data[eol:])
        self._request = HTTPRequest(
            connection=self, method=method, uri=uri, version=version,
            headers=headers, remote_ip=self.address[0])

        ...
        self.request_callback(self._request)

关键就出在这个 request_callback 身上。

(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/httpserver.py(294)_on_headers()
-> self.request_callback(self._request)

(Pdb) l
289                 if headers.get("Expect") == "100-continue":
290                     self.stream.write("HTTP/1.1 100 (Continue)\r\n\r\n")
291                 self.stream.read_bytes(content_length, self._on_request_body)
292                 return
293
294  ->         self.request_callback(self._request)
295
296         def _on_request_body(self, data):
297             self._request.body = data
298             content_type = self._request.headers.get("Content-Type", "")
299             if self._request.method == "POST":

(Pdb) self.request_callback
<tornado.web.Application object at 0x7f3d1d08fa50>
其实就是我们一路传递过来的 Application 对象,先是作为 HTTPServer.request_callback,然后在 HTTPServer._handle_events() 创建 HTTPConnection 对象时,又作为参数被传递过来。

Applicaiton 是一个对象,执行 Application(HTTPRequest) 会发生什么?很显然是 Application.__call__ 被勾搭出来。

class Application(object):
    def __call__(self, request):
        ...
        handlers = self._get_host_handlers(request)
        if not handlers:
            handler = RedirectHandler(
            request, "http://" + self.default_host + "/")
        else:
            for spec in handlers:
            match = spec.regex.match(request.path)
            if match:
                handler = spec.handler_class(self, request, **spec.kwargs)
                kwargs = match.groupdict()
                if kwargs:
                    args = []
                else:
                    args = match.groups()
                break

        ...

        handler._execute(transforms, *args, **kwargs)
        return handler

很简单,以 Request.Path 为条件匹配,找到我们最开始注册的 handler —— TestHandler,并通过调用 handler_class() 创建对象实例。

(Pdb) u
> /usr/local/lib/python2.6/dist-packages/tornado-0.2-py2.6.egg/tornado/web.py(1054)__call__()
-> handler._execute(transforms, *args, **kwargs)

(Pdb) l
1049                if getattr(RequestHandler, "_templates", None):
1050                  map(lambda loader: loader.reset(),
1051                      RequestHandler._templates.values())
1052                RequestHandler._static_hashes = {}
1053
1054 ->         handler._execute(transforms, *args, **kwargs)
1055            return handler
1056
1057        def reverse_url(self, name, *args):
1058            """Returns a URL path for handler named `name`
1059

(Pdb) handler
<__main__.TestHandler object at 0x7fd59598af50>

总算是柳暗花明,看到了 TestHandler。

class RequestHandler(object):
    def _execute(self, transforms, *args, **kwargs):
        ...
        if not self._finished:
            getattr(self, self.request.method.lower())(*args, **kwargs)
            if self._auto_finish and not self._finished:
                self.finish()
        ...

这个要是看不懂就要打PP了,通过 request.method,也就是 "POST, GET, DELETE ..." 之类的,在 RequestHandler 中查找同名的方法并进行调用,在本次调试中 TestHandler.get() 被执行。如此我们就完成了一个流程的单向跟踪。

class RequestHandler(object):
    def finish(self, chunk=None):
        ...
        if not self.application._wsgi:
            self.flush(include_footers=True)
            self.request.finish()
            self._log()
3. Back!

在 TestHandler.get() 中我们完成了相关的数据准备,接下来数据如何返回给客户端呢?回到调用 get() 的 RequestHandler._execute()。

class RequestHandler(object):
    def _execute(self, transforms, *args, **kwargs):
        ...
        if not self._finished:
            getattr(self, self.request.method.lower())(*args, **kwargs)
            if self._auto_finish and not self._finished:
                self.finish()
        ...

很显然这个 RequestHandler.finish() 是个重要的线索。

class RequestHandler(object):
    def finish(self, chunk=None):
        ...
        if not self.application._wsgi:
            self.flush(include_footers=True)
            self.request.finish()
            self._log()
        ...

    def flush(self, include_footers=False):
        ...
        chunk = "".join(self._write_buffer)
        ...
        if headers or chunk:
            self.request.write(headers + chunk)

class HTTPRequest(object):
    def write(self, chunk):
        ...
        if not self.stream.closed():
            self.stream.write(chunk, self._on_write_complete)

RequestHandler.flush() 将我们在 TestHandler.get() 中调用 self.write() 写入的数据合并写入 IOStream。

class IOStream(object):
    def write(self, data, callback=None):
        self._check_closed()
        self._write_buffer += data
        self._add_io_state(self.io_loop.WRITE)
        self._write_callback = callback

    def _handle_events(self, fd, events):
        ...
        if events & self.io_loop.WRITE:
            self._handle_write()

    def _handle_write(self):
        while self._write_buffer:
            ....
            num_bytes = self.socket.send(self._write_buffer)
            self._write_buffer = self._write_buffer[num_bytes:]

        ...

IOStream.write 中将要返回的数据附加到 self._write_buffer,并添加 IO_loop WRITE 事件监控。最终会触发 IOStream._handle_write(),将结果 send 给客户端 socket。

完成这些,HTTPRequest.finish() 会调用 HttpRequest.finish(),最终通过 IOStream.close() 移除事件,并关闭客户端 Socket 连接。

class HTTPRequest(object):
    def finish(self):
        self.connection.finish()
        self._finish_time = time.time()

class HTTPConnection(object):
    def finish(self):
        ...
        if not self.stream.writing():
            self._finish_request()

    def _finish_request(self):
        ...
        if disconnect:
            self.stream.close()
        return

class IOStream(object):
    def close(self):
        if self.socket is not None:
            self.io_loop.remove_handler(self.socket.fileno())
            self.socket.close()
            self.socket = None
            if self._close_callback: self._close_callback()

事情到这基本就结束了,对 Python 的逆向分析和 C、C# 并没有什么差别。合理使用 pdb 观察不同栈帧的环境变量和上下文,很容易完成跟踪工作。

阅读更多
换一批

没有更多推荐了,返回首页