[Erlang危机](3.0)过载处理


原创文章,转载请注明出处:服务器非业余研究http://blog.csdn.net/erlib 作者Sunface





Chapter 3 Planning for Overload


处理过载



By far, the most common cause of failure I’ve encountered in real-world scenarios is due to the node running out of memory. Furthermore, it is usually related to message queues going out of bounds. 1

 到目前为止,我在现实中遇到过最常见导致失败的原因:节点内存耗尽。而且,内存耗尽常常是由于进程消息队列太长导致的。 


 

There are plenty of ways to deal with this, but knowing which one to use will require a decent understanding of the system you’re working on.
To oversimplify things, most of the projects I end up working on can be visualized as a very large bathroom sink. User and data input are flowing from the faucet.

 处理内存耗尽的方案有很多,但要求对你需要处理的系统有一个深入的了解,才能选择相对好的方案。
 为了简明的阐述问题,我参与过的项目都可以被看成是一个巨大的浴室水池,用户和输入数据从水龙头涌出来。 


 

The Erlang system itself is the sink and the pipes, and wherever the output goes (whether it’s a database, an external API or service, and so on) is the sewer system.

Erlang 系统自身相当于水池和管道,所有的输出(数据库或外部API接口,或其它服务项等等)就是下水道。

When an Erlang node dies because of a queue overflowing, figuring out who to blame is crucial.
Did someone put too much water in the sink? Are the sewer systems backing up?Did you just design too small a pipe?

   当Erlang节点因为内存溢出挂掉时,就需要找出罪魁祸首了。是不是有人往水池注水过度了?下水道堵了?还是设计的管道太小了? 


 

Determining what queue blew up is not necessarily hard. This is information that can be found in a crash dump.
Finding out why it blew up is trickier. Based on the role of the process or run-time inspection, it’s possible to figure out whether causes include fast flooding, blocked processes that won’t process messages fast enough, and so on.

 你可以在crash dump文件中轻易地找到什么队列挂掉了,但是比较棘手的是为什么这队列会挂掉。这取决于进程所扮演的角色或运行时检测,比较容易确定的原因:大量的消息冲击或被堵塞的进程不能快速的处理消息等等。 


 

The most difficult part is to decide how to fix it. When the sink gets clogged up by too much waste, we will usually start by trying to make the bathroom sink itself larger (the part of our program that crashed, at the edge).
Then we figure out the sink’s drain is too small, and optimize that. Then we find out the pipes themselves are too narrow, and optimize that.

 最难的部分是如何解决这个问题,当水池被大量的物体堵塞时,我们通常是试图把水池建的更大,然后我们发现水池的排水量也太小了,因此又优化了下排水量,紧接着又冒出问题是管道太窄了,马上又优化了管道。 


 

The overload gets pushed further down the system, until the sewers can’t take it anymore. At that point, we may try to add sinks or add bathrooms to help with the global input level.
Then there’s a point where things can’t be improved anymore at the bathroom’s level. There are too many logs sent around, there’s a bottleneck on databases that need the consistency, or there’s simply not enough knowledge or manpower in your organization to improve things there.

 过载会慢慢降低系统的性能,直到下水道也不能承载这些负荷,这时我们可能会通过增加水池或者浴室数量来分担负荷,然后当在上述层面已优化到最优时, 你又发现还有大量飞来飞去的日志,数据库方面待克服的瓶颈,又或者没有足够的知识或者人力来改善这些。 


 

By finding that point, we identified what the true bottleneck of the system was, and all the prior optimization was nice (and likely expensive), but it was more or less in vain.

 通过以上的找寻,我们也许终于可以定位到了系统的瓶颈是在哪里。虽然之前所有的优化都很好(也可能代价很大),但这或多或少还是徒劳无益的。


 

We need to be more clever, and so things are moved back up a level. We try to massage the information going in the system to make it either lighter (whether it is through compression, better algorithms and data representation, caching, and so on).

 我们可以尝试用更加明智的方法,先把一切都恢复原样,然后尝试把系统朝更轻量级来优化(不管是使用压缩,更优的算法,或更好的数据结构,缓存等等). 


 

Even then, there are times where the overload will be too much, and we have to make the hard decisions between restricting the input to the system, discarding it, or accepting that the system will reduce its quality of service up to the point it will crash.

 即使这样,有些情况的负荷还是会很大,那么我们就不得不选择限制系统的输入或者抛弃掉一些输入,或接受系统达到这种程度后就会降低性能的设定。 


 

These mechanisms fall into two broad strategies: back-pressure and load-shedding.
We’ll explore them in this chapter, along with common events that end up causing Erlang systems to blow up.

 这些机制可以分为两大策略:back-pressure和load-shedding,会在随后章节会进行介绍,并且了解那些导致Erlang崩溃的常见事件.


 

[1] Figuring out that a message queue is the problem is explained in Chapter 6, specifically in Section 6.2

[注1]:你可以在章节6的6.2找到消息队列的相关问题描述。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值