低延迟系统的 11 个最佳实践

最新推荐文章于 2024-03-15 00:39:00 发布

Adam040606

最新推荐文章于 2024-03-15 00:39:00 发布

阅读量637

点赞数

分类专栏：网络编程/服务器设计

网络编程/服务器设计专栏收录该内容

23 篇文章 0 订阅

订阅专栏

转自：http://www.oschina.net/translate/11-best-practices-for-low-latency-systems?print

Its been 8 years since Google noticed that an extra 500ms of latency dropped traffic by 20% and Amazon realized that 100ms of extra latency dropped sales by 1%. Ever since then developers have been racing to the bottom of the latency curve, culminating in front-end developers squeezing every last millisecond out of their JavaScript, CSS, and even HTML. What follows is a random walk through a variety of best practices to keep in mind when designing low latency systems. Most of these suggestions are taken to the logical extreme but of course tradeoffs can be made. (Thanks to an anonymous user for asking this question onQuora and getting me to put my thoughts down in writing).

Choose the right language

Scripting languages need not apply. Though they keep getting faster and faster, when you are looking to shave those last few milliseconds off your processing time you cannot have the overhead of an interpreted language. Additionally, you will want a strong memory model to enable lock free programming so you should be looking at Java, Scala, C++11 or Go.

Keep it all in memory

I/O will kill your latency, so make sure all of your data is in memory. This generally means managing your own in-memory data structures and maintaining a persistent log, so you can rebuild the state after a machine or process restart. Some options for a persistent log include Bitcask, Krati, LevelDB and BDB-JE. Alternatively, you might be able to get away with running a local, persisted in-memory database like redisor MongoDB (with memory >> data). Note that you can loose some data on crash due to their background syncing to disk.

Keep data and processing colocated

Network hops are faster than disk seeks but even still they will add a lot of overhead. Ideally, your data should fit entirely in memory on one host. With AWS providing almost 1/4 TB of RAM in the cloud and physical servers offering multiple TBs this is generally possible. If you need to run on more than one host you should ensure that your data and requests are properly partitioned so that all the data necessary to service a given request is available locally.

Keep the system underutilized

Low latency requires always having resources to process the request. Don’t try to run at the limit of what your hardware/software can provide. Always have lots of head room for bursts and then some.

Keep context switches to a minimum

Context switches are a sign that you are doing more compute work than you have resources for. You will want to limit your number of threads to the number of cores on your system and to pin each thread to its own core.

Keep your reads sequential

All forms of storage, wither it be rotational, flash based, or memory performs significantly better when used sequentially. When issuing sequential reads to memory you trigger the use of prefetching at the RAM level as well as at the CPU cache level. If done properly, the next piece of data you need will always be in L1 cache right before you need it. The easiest way to help this process along is to make heavy use of arrays of primitive data types or structs. Following pointers, either through use of linked lists or through arrays of objects, should be avoided at all costs.

Batch your writes

This sounds counterintuitive but you can gain significant improvements in performance by batching writes. However, there is a misconception that this means the system should wait an arbitrary amount of time before doing a write. Instead, one thread should spin in a tight loop doing I/O. Each write will batch all the data that arrived since the last write was issued. This makes for a very fast and adaptive system.

Respect your cache

With all of these optimizations in place, memory access quickly becomes a bottleneck. Pinning threads to their own cores helps reduce CPU cache pollution and sequential I/O also helps preload the cache. Beyond that, you should keep memory sizes down using primitive data types so more data fits in cache. Additionally, you can look into cache-oblivious algorithms which work by recursively breaking down the data until it fits in cache and then doing any necessary processing.

Non blocking as much as possible

Make friends with non blocking and wait free data structures and algorithms. Every time you use a lock you have to go down the stack to the OS to mediate the lock which is a huge overhead. Often, if you know what you are doing, you can get around locks by understanding the memory model of the JVM, C++11 or Go.

Async as much as possible

Any processing and particularly any I/O that is not absolutely necessary for building the response should be done outside the critical path.

Parallelize as much as possible

Any processing and particularly any I/O that can happen in parallel should be done in parallel. For instance if your high availability strategy includes logging transactions to disk and sending transactions to a secondary server those actions can happen in parallel.

Resources

Almost all of this comes from following what LMAX is doing with their Disruptor project. Read up on that and follow anything that Martin Thompson does.

Additional Blogs

自从Google发布额外的一个500ms延迟将减少20%的流量以及亚马逊发现额外的100ms延迟会使销售量下降1%已经8年了。此后，开发者们一直奋战在延迟曲线的底部，甚至前端开发者们都在压缩JavaScript、CSS以及HTML来争取分毫时间。以下是各种低延迟系统设计时需牢记在心的最佳实践的一个概览。大多数这些建议考虑的是逻辑上极端，可以权衡使用。（感谢在Quora上问这个问题的匿名用户，这让我把我的想法写了下来）。

选择正确的语言

脚本语言不要使用，尽管它们越来越快，当你处理关键事务像拿掉进程的最后几毫秒时间时，你处理不掉解释型语言的开销。此外，你会想要一个强有力的记忆模型来进行无锁编程，所以你应该看Java、Scala、C++11或Go。

把一切放在内存里

I/O会是延迟主因，所以确保你的所有数据在内存中。这通常意味着管理内存中的数据结构以及维护现有记录，这样你在重启机器或进程后能够重建之前的状态。维持记录的选择包括Bitcask、Krati、LevelDB和BDB-JE。或者你也可以本地运行一个像redis或MongoDB(memory >> data)这样的内存型数据库。但需要注意的是，在它们后台同步数据到硬盘crash时你仍旧可能丢失一些数据。

确保数据和处理程序的位置

网络跳数比磁盘寻道要快，但即使这样通过网络也会增加很多开销。理想情况下，数据应当完全在主机的内存中。像AWS云中几乎提供了1/4TB的内存，物理服务器提供多个TB现在也很常见。如果你需要运行在多个主机上，你应当确保数据和请求被适当的划分，使得在服务请求时所有必要的数据都在本地。

保持系统未充分使用

低延迟总是需要有资源来处理新的请求。不要试图运行到你的软硬件资源的极限。总要有一些峰值储备用于突发情况。

保证内容切换最小化

内容切换是你正在运行的计算工作超过你拥有的资源的一个信号。你需要限制你的线程数使之与你系统的内核数相匹配，确保每个线程与它的主线程相关联。

确保读操作的连续性

各种形式的存储，无沦是轮转式的，还是基于闪存的或内存的，顺序使用的时候，它们的性能都会好一些。当进行内存的连续读操作时，你将会触发RAM级和CPU缓存级的预读处理。如果预读操作顺利的完成，那么下一步所需的数据在使用前都已在L1级的缓存中。促进这一处理的简便方法是：大量的使用原始数据类型的或结构体数组。不惜一切代价避免使用基于链表或者对象数组的随动指针。

批量处理写入操作

这个听起来有背常理，但是你可以通过批量处理写操作获得明显的性能提升。然而，总有一些错觉：这就意味着在写入操作前系统需要等待任意的时间。相反地，线程在做输入/输出操作时会紧凑地循环。每次写操作都会在上一次操作完成后对到达的数据批量的处理。这种机制确保了系统快速的响应和良好的适应性。

优化缓存

在各个位置的优化中，内存的快速访问成了瓶颈。把线程与它们的主线性相关联有助于降低CPU缓存污染；连续性输入/输出处理，同样有利于缓存中的预加载。除此之外，你可以使用基本数据类型来降低内存的占用量，这样就可以有更多的数据加载到高速缓存中。另外你可参阅高速缓存参数无关算法，它通过递归的分解数据直到数据与缓存大小匹配，然后进行所需的处理。

尽可能多的使用非阻塞模式

使用非阻塞模式并使用不受约束的数据结构和算法。每次你在使用锁时都要深入到栈的操作系统层去处理锁，这是件令人头痛的事。通常如果你知道自己正在做什么，你得理解JVM、C++或者Go的内存模型才可以去应付锁的相关处理。

尽可能多的使用异步

任何处理特别是不需要创建响应输入/输出处理应当在关键路径之外处理。

尽可能多的并行处理

任何处理特别是可以并行操作的输入/输出处理应当并行的处理。例如你的高可用性策略包括把在磁盘中写入事务日志和把事务发送到二级服务器，这些动作都查可以并行处理的。

参考资料：

所有内容均来自于LMAX的Disruptor项目，您可以仔细研读或关注Martin Tompson的分享。

其它的博客：

本文地址：http://www.oschina.net/translate/11-best-practices-for-low-latency-systems

原文地址：http://codedependents.com/2014/01/27/11-best-practices-for-low-latency-systems/

Adam040606

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
低延迟系统的 11 个最佳实践

转自：http://www.oschina.net/translate/11-best-practices-for-low-latency-systems?print自从Google发布额外的一个500ms延迟将减少20%的流量以及亚马逊发现额外的100ms延迟会使销售量下降1%已经8年了。此后，开发者们一直奋战在延迟曲线的底部，甚至前端开发者们都在压缩JavaScript、C
复制链接

扫一扫

专栏目录