1.背景
在bolt中,需要在topology被关闭前执行某个操作,而根据官方文档:
The
cleanup
method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there's no way to invoke the method. The
cleanup
method is intended for when you run topologies in
local mode (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.
cleanup方法并不可靠,它只在local mode下生效。
2.解决方案
在killing a topology之前,需要先deactivate相应的topology,然后处理未完成的message。可以调用Spout.deactivate()方法,传给bolt一个特殊的tuple,在bolt处检查该特殊tuple,一旦收到执行需要执行的操作。
tuple的特殊性可以通过tuple的stream来区分。
3.code&result
下面的代码中,正常的消息都来自kafkaSpout,对于关闭的信息,单独写了一个spout叫MySpout来处理。
topology部分:增加单独的Spout
MySpout部分:重点在于deactivate方法和declareOutputFields方法,后者的
message.declareStream方法标记了topology deactivate前发送的tuple的stream,用于在bolt中与普通tuple做区别。
ParseBolt部分:
4.注意事项
(1)kill topology时,建议输入的等待时间尽量长,有时时间过短消息来不及传递,会导致该方法失效
(2)对接spout的bolt一般不只一个,需要用allGrouping策略来确保这些bolt都收到消息