Basic distinguish between web.py / flup and tornado web process handling model (TBC)

最新推荐文章于 2023-09-14 11:38:55 发布

Wolf0403

最新推荐文章于 2023-09-14 11:38:55 发布

阅读量3.6k

点赞数

分类专栏： Comp/Prog/Linux Tips 文章标签： tornado web.py

本文链接：https://blog.csdn.net/Wolf0403/article/details/8394617

版权

Comp/Prog/Linux Tips 专栏收录该内容

60 篇文章 0 订阅

订阅专栏

Tornado is known for its capability of handling concurrent connections with help of OS event triggering mechanisms like epoll and kqueue.

Web.py is a web framework for Python. It relies on other server packages to serve as a complete web server software.

When trying to setup, Tornado could be put to work on its own, while common setup is to put behind an nginx server (via proxy_pass) for handling static resources and other matters while leave Tornado to deal with dynamic requests after reverse proxy.

In contrast, web.py usually requires flup to run as a FastCGI service and then is connected to nginx via fastcgi_pass derivatives.

Appears to a new user they are similar to some extend. I wrote a few very simple scripts [1] / [2] and tested them in the same server running behind the same nginx configuration. Each running two processes (web.py via spawn-fcgi, tornado via tornado.process.fork_processes) and returning simple string within GET handler. In average nginx + tornado gives 75 - 125ms serve time per requests whilenginx + web.py at `3sec per request, both at 50 concurrent clients (ab -c50). With less concurrent clients the time difference may be up to 10 times even.

Then I added minor delay in the GET handler for both scripts (with time.sleep (0.1)) to simulate some system processing time. I was dealing with relatively time consuming filesystem requests with my web service before starting looking at both solutions and therefore this simulation is quite similar to the kind of prolem I am looking at. Surprisingly, thenginx + tornado script slowed to 5sec+ per request and is much, much slower than web.py.

I understood how Tornado works, based on my understanding of epoll / IO multiplexing theories. However since web.py is kind of a mistery I had to look into the source code. Then I saw that the web.py snippet called into flup for creating an "runwsgi" function, which in tern creates an ThreadedServer within flup. ThreadedServer had an addJob method which is so familiar looking, and within minute I could see that, for each client socket returned from the select call (ThreadedServer.run), a new "job" hence a new thread in pool is created. Legendary one thread per client model. Even without looking at how web.py (and my code) was called back from flup, I know that:

for those blocking calls (either blocking I/O operations or other matters like the time.sleep call here) are handled by threads / OS scheduler
with large amount of simple, non-blocking / once-off requests, they must be slower than epoll approach.

However when blocking operations appear (such as my sleep call, filesystems, DB calls, etc), epoll will NOT help. OS will wait for such operations to finish before returning to the script. Since there are only two Tornado processes running, there can only be no more than two instances of clients being served at the same time, even both are sleeping. With flup, threads are created and scheduled by the OS therefore they could be scheduled to run as long as CPU isn't completely hogged.

If we look at the packages available to Tornado, apart from the server package, there are http client packages, async Mongodb packages and some authentication packages built around the http client package. We could clearly see that, to better utilized Tornado, application need to better use the epoll / IOLoop as the core of application. Tornado framework handles all network waiting time (using epoll) and carefully crafted apps would then response to all events in a timely manner. It's very different from the traditional CGI style of request handling, but it's definitely towards the right direction.

Issues left over:

Tornado didn't have async MySQL package available and FriendFeed (original author) mentioned [3] that
We experimented with different async DB approaches, but settled on
synchronous at FriendFeed because generally if our DB queries were
backlogging our requests, our backends couldn't scale to the load
anyway. Things that were slow enough were abstracted to separate
backend services which we fetched asynchronously via the async HTTP
module.
Question: how to better arrange resources to run other services to handling blocking services? Upon what principles design decitions should be made?

When testing response speed of Tornado raw (without nginx) using ab shipped with OS X ML request failed from time to time. Saw mentioning that these are caused by bugs in the version of ab shipped with OS X. Should re-test with palb (python implemetation of ab) or other implementations.

Bug: http://simon.heimlicher.com/articles/2012/07/08/fix-apache-bench-ab-on-os-x-lion

Test with palb, with or without set_header ('Connection', 'Keep-Alive') such conn reset errors is not presenting.

Nginx speaks HTTP/1.0 when used as (reverse) proxy server, which closes connection upon each request. How does this affect the performance of Tornado server? I suppose epoll is designed for comet usages (large number of stale connections)?

Answer: Nginx actually support HTTP/1.1 and Keepalive for upstream proxy settings. See http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

[4] mentioned using HAProxy instead of nginx. Might worth looking at.

With the Tornado code specified, even there are two Python processes running after starting the server, there is only one accepting requests. Possible solutions: may still need to manage running on two ports and load balance with nginx but it's not ideal. fork_processes model should have made its way around this problem.

Solution:

fork_process (0) / start (0) creates worker processes based on CPU number in system. Observed two python processes means only one worker process is created - therefore only one process is running request handlers. Testing VM was a single core system. Specifying start(2) results in 3 python processes and two are sharing the load.

Links:

[1] Web.py test script: https://gist.github.com/4371628

[2] Tornado test script: https://gist.github.com/4363542

[3] http://news.ycombinator.com/item?id=3025475

[4] "Need help on putting tornado apps on production", great info packed - https://groups.google.com/forum/?fromgroups=#!topic/python-tornado/62TLw_gmp94

Wolf0403

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Basic distinguish between web.py / flup and tornado web process handling model (TBC)

Tornado is known for its capability of handling concurrent connections with help of OS event triggering mechanisms like epoll and kqueue.Web.py is a web framework for Python. It relies on other serv
复制链接

扫一扫

专栏目录