Storm Fault tolerance

下面主要说明Storm在容错方面做的一些处理,虽说都是理论上的表述,但是可以在实际测试的过程中验证一下这些情况。


1)What happens when a worker dies?

When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine.


当worker挂了的时候,supervisor负责重启worker,但是因启动失败而导致Nimbus很长时间没有收到worker的心跳时,Nimbus会在其他机器上重启该worker。


2)What happens when a node dies?

The tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.

当一个节点挂了时候,该机器上的task会因为超时而被Nimbus重新分配给其他机器。


3)What happens when Nimbus or Supervisor daemons die?

The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk). As described in Setting up a Storm cluster, the Nimbus and Supervisor daemons must be run under supervision using a tool like daemontools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.


Most notably, no worker processes are affected by the death of Nimbus or the Supervisors. This is in contrast to Hadoop, where if the JobTracker dies, all the running jobs are lost.


Storm在设计Nimbus和Supervisor的时候,它们是无状态的(状态信息保存在Zookeeper或者disk上),并期望它们能够在挂掉的时候能够迅速被重启(fail-fast),所以在使用storm的时候最后有一个监控程序,负责重启挂掉的Nimbus或者Supervisor。

在Storm中Nimbus或者Supervisor短暂挂掉,基本上不会对worker有影响,这个和Hadoop中的JobTracker挂了有很大的不同。


4)Is Nimbus a single point of failure?


If you lose the Nimbus node, the workers will still continue to function. Additionally, supervisors will continue to restart workers if they die. However, without Nimbus, workers won't be reassigned to other machines when necessary (like if you lose a worker machine).


So the answer is that Nimbus is "sort of" a SPOF. In practice, it's not a big deal since nothing catastrophic happens when the Nimbus daemon dies. There are plans to make Nimbus highly available in the future.

这里描述的是Nimbus是否是SPOF,当Nimbus挂掉的时候,worker进程是能够继续工作的,并且supervisor本身就能够负责worker重启的任务,这个过程并不需要Nimbus参与,但是当worker在本机上重启失败的时候,因为Nimbus挂了,而不能够将该worker重新分配给其他机器。

所以说Nimbus可以认为是一个SPOF,但是并不会像hadoop JobTracker挂掉那样产生很严重的影响。


5)How does Storm guarantee data processing?

Storm provides mechanisms to guarantee data processing even if nodes die or messages are lost. See Guaranteeing message processing for the details.

Storm在数据可靠性方面是如何保证的,能够在节点挂掉或者消息丢失的时候消息会被重放(retry),可以参考:https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing

参考:https://github.com/nathanmarz/storm/wiki/Fault-tolerance

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值