python每秒20个请求_使用Python每秒百万个请求

最新推荐文章于 2021-03-21 11:39:41 发布

cumi7754

最新推荐文章于 2021-03-21 11:39:41 发布

阅读量1.7k

点赞数

文章标签： java python linux 分布式编程语言

原文链接：https://www.freecodecamp.org/news/million-requests-per-second-with-python-95c137af319/

版权

python每秒20个请求

by Paweł Piotr Przeradowski

通过PawełPiotr Przeradowski

使用Python每秒百万个请求 (A million requests per second with Python)

Is it possible to hit a million requests per second with Python? Probably not until recently.

使用Python每秒可以处理一百万个请求吗？大概直到最近。

A lot of companies are migrating away from Python and to other programming languages so that they can boost their operation performance and save on server prices, but there’s no need really. Python can be right tool for the job.

许多公司正在从Python迁移到其他编程语言，以便它们可以提高其操作性能并节省服务器价格，但实际上并不需要。 Python可能是完成这项工作的合适工具。

The Python community is doing a lot of around performance lately. CPython 3.6 boosted overall interpreter performance with new dictionary implementation. CPython 3.7 is going to be even faster, thanks to the introduction of faster call convention and dictionary lookup caches.

最近，Python社区在性能方面做了很多工作。 CPython 3.6通过新的字典实现提高了整体解释器的性能。由于引入了更快的调用约定和字典查找缓存，CPython 3.7的速度将会更快。

For number crunching tasks you can use PyPy with its just-in-time code compilation. You can also run NumPy’s test suite, which now has improved overall compatibility with C extensions. Later this year PyPy is expected to reach Python 3.5 conformance.

对于数字处理任务，您可以将PyPy与即时代码一起使用。您还可以运行NumPy的测试套件，该套件现在已改善了与C扩展的整体兼容性。今年晚些时候，PyPy有望达到Python 3.5一致性。

All this great work inspired me to innovate in one of the areas where Python is used extensively: web and micro-service development.

所有这些出色的工作激发了我在广泛使用Python的领域之一中进行创新：Web和微服务开发。

输入Japronto！ (Enter Japronto!)

Japronto is a brand new micro-framework tailored for your micro-services needs. Its main goals include being fast, scalable, and lightweight. It lets you do both synchronous and asynchronous programming thanks to asyncio. And it’s shamelessly fast. Even faster than NodeJS and Go.

Japronto是为您的微服务需求量身定制的全新微框架。其主要目标包括快速， 可扩展和轻量级 。多亏了asyncio ，您可以同时进行同步和异步编程。而且它速度很快 。比NodeJS和Go还要快。

Errata: As user @heppu points out, Go’s stdlib HTTP server can be 12% faster than this graph shows when written more carefully. Also there’s an awesome fasthttp server for Go that apparently is only 18% slower than Japronto in this particular benchmark. Awesome! For details see https://github.com/squeaky-pl/japronto/pull/12 and https://github.com/squeaky-pl/japronto/pull/14.

勘误：正如用户@heppu所指出的，如果更仔细地编写，Go的stdlib HTTP服务器可以比此图显示的速度快12％ 。还有一个很棒的Go的fasthttp服务器，显然在该特定基准测试中仅比Japronto 慢18％ 。太棒了！有关详细信息，请参见https://github.com/squeaky-pl/japronto/pull/12和https://github.com/squeaky-pl/japronto/pull/14 。

We can also see that Meinheld WSGI server is almost on par with NodeJS and Go. Despite of its inherently blocking design, it is a great performer compared to the preceding four, which are asynchronous Python solutions. So never trust anyone who says that asynchronous systems are always speedier. They are almost always more concurrent, but there’s much more to it than just that.

我们还可以看到Meinheld WSGI服务器几乎与NodeJS和Go相当。尽管具有固有的阻塞设计，但与异步Python解决方案的前四个相比，它的性能出色。因此，永远不要相信任何说异步系统总是更快的人。它们几乎总是并发的，但是不仅仅如此。

I performed this micro benchmark using a “Hello world!” application, but it clearly demonstrates server-framework overhead for a number of solutions.

我使用“ Hello world！”执行了此微型基准测试。应用程序，但是它清楚地展示了许多解决方案的服务器框架开销。

These results were obtained on an AWS c4.2xlarge instance that had 8 VCPUs, launched in São Paulo region with default shared tenancy and HVM virtualization and magnetic storage. The machine was running Ubuntu 16.04.1 LTS (Xenial Xerus) with the Linux 4.4.0–53-generic x86_64 kernel. The OS was reporting Xeon® CPU E5–2666 v3 @ 2.90GHz CPU. I used Python 3.6, which I freshly compiled from its source code.

这些结果是在具有8个VCPU的AWS c4.2xlarge实例上获得的，该实例在圣保罗地区启动，具有默认的共享租期以及HVM虚拟化和磁存储。该机器运行的Linux 14.04.1 LTS(Xenial Xerus)和Linux 4.4.0–53通用x86_64内核。操作系统报告的是Xeon®CPU E5-2666 v3 @ 2.90GHz CPU。我使用了Python 3.6，我从它的源代码中重新编译了它。

To be fair, all the contestants (including Go) were running a single-worker process. Servers were load tested using wrk with 1 thread, 100 connections, and 24 simultaneous (pipelined) requests per connection (cumulative parallelism of 2400 requests).

公平地说，所有参赛者(包括围棋)都在运行一个单一工作流程。使用wrk对服务器进行了负载测试，该服务器具有1个线程，100个连接以及每个连接24个同时(流水线)请求(2400个请求的并行并行性)。

HTTP pipelining is crucial here since it’s one of the optimizations that Japronto takes into account when executing requests.

HTTP管道在这里至关重要，因为它是Japronto在执行请求时要考虑的优化之一。

Most of the servers execute requests from pipelining clients in the same fashion they would from non-pipelining clients. They don’t try to optimize it. (In fact Sanic and Meinheld will also silently drop requests from pipelining clients, which is a violation of HTTP 1.1 protocol.)

大多数服务器以与非流水线客户端相同的方式执行流水线客户端的请求。他们不会尝试对其进行优化。 (实际上，Sanic和Meinheld还将静默丢弃来自流水线客户端的请求，这违反了HTTP 1.1协议。)

In simple words, pipelining is a technique in which the client doesn’t need to wait for the response before sending subsequent requests over the same TCP connection. To ensure integrity of the communication, the server sends back several responses in the same order requests are received.

简而言之，流水线技术是一种客户端无需在通过同一TCP连接发送后续请求之前等待响应的技术。为了确保通信的完整性，服务器以接收到相同顺序的请求发送回几个响应。

优化的细节 (The gory details of optimizations)

When many small GET requests are pipelined together by the client, there’s a high probability that they’ll arrive in one TCP packet (thanks to Nagle’s algorithm) on the server side, then be read back by one system call.

当客户端将许多小的GET请求流水线化在一起时，很有可能它们会到达服务器端的一个TCP数据包(由于Nagle的算法 )，然后被一个系统调用 读回。

Doing a system call and moving data from kernel-space to user-space is a very expensive operation compared to, say, moving memory inside process space. That’s why doing it’s important to perform as few as necessary system calls (but no less).

与在进程空间内移动内存相比，进行系统调用并将数据从内核空间移至用户空间是一项非常昂贵的操作。这就是为什么执行尽可能少的必要系统调用(但不少于此)很重要的原因。

When Japronto receives data and successfully parses several requests out of it, it tries to execute all the requests as fast as possible, glue responses back in correct order, then write back in one system call. In fact the kernel can aid in the gluing part, thanks to scatter/gather IO system calls, which Japronto doesn’t use yet.

当Japronto接收数据并成功解析出其中的几个请求时，它会尝试尽快执行所有请求，以正确的顺序粘贴响应，然后在一个系统调用中 写回。实际上，由于Japronto尚未使用分散/聚集IO系统调用，因此内核可以帮助完成粘合工作。

Note that this isn’t always possible, since some of the requests could take too long, and waiting for them would needlessly increase latency.

请注意，这并非总是可能的，因为某些请求可能会花费很长时间，而等待它们会不必要地增加延迟。

Take care when you tune heuristics, and consider the cost of system calls and the expected request completion time.

调整启发式方法时请多加注意，并考虑系统调用的成本和预期的请求完成时间。

Besides delaying writes for pipelined clients, there are several other techniques that the code employs.

除了延迟对流水线客户端的写入，该代码还采用了其他几种技术。

Japronto is written almost entirely in C. The parser, protocol, connection reaper, router, request, and response objects are written as C extensions.

Japronto几乎完全用C编写。解析器，协议，连接收割器，路由器，请求和响应对象都写为C扩展。

Japronto tries hard to delay creation of Python counterparts of its internal structures until asked explicitly. For example, a headers dictionary won’t be created until it’s requested in a view. All the token boundaries are already marked before but normalization of header keys, and creation of several str objects is done when they’re accessed for the first time.

Japronto努力延迟其内部结构的Python对应版本的创建，直到明确要求为止。例如，只有在视图中请求时才创建标题字典。除令牌头键已标准化外，所有标记边界均已标记过，并且首次访问它们时会创建多个str对象。

Japronto relies on the excellent picohttpparser C library for parsing status line, headers, and a chunked HTTP message body. Picohttpparser directly employs text processing instructions found in modern CPUs with SSE4.2 extensions (almost any 10-year-old x86_64 CPU has it) to quickly match boundaries of HTTP tokens. The I/O is handled by the super awesome uvloop, which itself is a wrapper around libuv. At the lowest level, this is a bridge to epoll system call providing asynchronous notifications on read-write readiness.

Japronto依靠出色的picohttpparser C库来解析状态行，标头和分块的HTTP消息正文。 Picohttpparser直接使用具有SSE4.2扩展的现代CPU(几乎任何使用10年的x86_64 CPU都有)的文本处理指令来快速匹配HTTP令牌的边界。 I / O由超棒的uvloop处理，它本身就是libuv的包装。在最低级别上，这是epoll系统调用的桥梁，可提供有关读写就绪状态的异步通知。

Python is a garbage collected language, so care needs to be taken when designing high performance systems so as not to needlessly increase pressure on the garbage collector. The internal design of Japronto tries to avoid reference cycles and do as few allocations/deallocations as necessary. It does this by preallocating some objects into so-called arenas. It also tries to reuse Python objects for future requests if they’re no longer referenced instead of throwing them away.

Python是一种垃圾收集语言，因此在设计高性能系统时需要格外小心，以免不必要地增加垃圾收集器的压力。 Japronto的内部设计试图避免参考循环，并根据需要进行尽可能少的分配/取消分配。它通过将一些对象预分配到所谓的“竞技场”中来实现。如果不再引用它们，它还会尝试将Python对象重新用于将来的请求，而不是丢弃它们。

All the allocations are done as multiples of 4KB. Internal structures are carefully laid out so that data used frequently together is close enough in memory, minimizing the possibility of cache misses.

所有分配均以4KB的倍数完成。内部结构经过精心布置，以便经常一起使用的数据在内存中足够接近，从而最大程度地减少了高速缓存未命中的可能性。

Japronto tries to not copy between buffers unnecessarily, and does many operations in-place. For example, it percent-decodes the path before matching in the router process.

Japronto尝试不要在缓冲区之间不必要地复制，并就地执行许多操作。例如，它在路由器过程中匹配之前对路径进行百分比解码。

开源贡献者，我可以使用您的帮助。 (Open source contributors, I could use your help.)

I’ve been working on Japronto continuously for past 3 months — often during weekends, as well as normal work days. This was only possible due to me taking a break from my regular programmer job and putting all my effort into this project.

在过去的3个月中，我一直在Japronto上工作-经常在周末以及正常工作日。这仅是由于我中断了我的常规程序员工作，并将所有精力都投入到这个项目中。

I think it’s time to share the fruit of my labor with the community.

我认为是时候与社区分享我的工作成果了。

Currently Japronto implements a pretty solid feature set:

目前， Japronto实现了非常可靠的功能集：

HTTP 1.x implementation with support for chunked uploads
HTTP 1.x实现，支持分块上传
Full support for HTTP pipelining
全面支持HTTP流水线
Keep-alive connections with configurable reaper
具有可配置收割器的保持活动连接
Support for synchronous and asynchronous views
支持同步和异步视图
Master-multiworker model based on forking
基于分叉的Master-multiworker模型
Support for code reloading on changes
支持更改时重新加载代码
Simple routing
简单的路由

I would like to look into Websockets and streaming HTTP responses asynchronously next.

接下来，我想研究Websockets和HTTP流异步响应。

There’s a lot of work to be done in terms of documenting and testing. If you’re interested in helping, please contact me directly on Twitter. Here’s Japronto’s GitHub project repository.

在文档和测试方面，有很多工作要做。如果您有兴趣提供帮助，请直接通过Twitter与我联系。这是Japronto的GitHub项目存储库。

Also, if your company is looking for a Python developer who’s a performance freak and also does DevOps, I’m open to hearing about that. I am going to consider positions worldwide.

另外，如果您的公司正在寻找表现出色并且也从事DevOps的Python开发人员，那么我很乐意听到这一点。我将考虑在全球范围内的职位。

最后的话 (Final words)

All the techniques that I’ve mentioned here are not really specific to Python. They could be probably employed in other languages like Ruby, JavaScript or even PHP. I’d be interested in doing such work, too, but this sadly will not happen unless somebody can fund it.

我在这里提到的所有技术并不是真正针对Python的。它们可能会用在其他语言中，例如Ruby，JavaScript甚至PHP。我也会对这样的工作感兴趣，但是可悲的是，除非有人能够资助，否则这不会发生。

I’d like to thank Python community for their continuous investment in performance engineering. Namely Victor Stinner @VictorStinner, INADA Naoki @methane and Yury Selivanov @1st1 and entire PyPy team.

我要感谢Python社区对性能工程的持续投资。即Victor Stinner @VictorStinner， INADA Naoki @甲烷和Yury Selivanov @ 1st1以及整个PyPy团队。

For the love of Python.

对于Python的热爱。