topology.max.spout.pending详解

最新推荐文章于 2019-03-05 16:29:32 发布

weixin_34088598

最新推荐文章于 2019-03-05 16:29:32 发布

阅读量496

点赞数

原文链接：https://my.oschina.net/cjun/blog/372730

版权

为什么80%的码农都做不了架构师？>>>

storm里面topology.max.spout.pending属性解释：

1.同时活跃的batch数量，你必须设置同时处理的batch数量。你可以通过”topology.max.spout.pending” 来指定，如果你不指定，默认是1。

2.topology.max.spout.pending 的意义在于，缓存spout 发送出去的tuple，当下流的bolt还有topology.max.spout.pending 个 tuple 没有消费完时，spout会停下来，等待下游bolt去消费，当tuple 的个数少于topology.max.spout.pending个数时，spout 会继续从消息源读取消息。（这个属性只对可靠消息处理有用）

第一种同事亲测成功，当使用事务时topology.max.spout.pending确实表示同时处理的batch数量，但是第二种跟官方的英文api解释差不多，官方api上的解释如下：

/**

* The maximum number of tuples that can be pending on a spout task at any given time.

* This config applies to individual tasks, not to spouts or topologies as a whole.

* A pending tuple is one that has been emitted from a spout but has not been acked or failed yet.

* Note that this config parameter has no effect for unreliable spouts that don't tag

* their tuples with a message id.

*/

所以如果使用事务，则表示同时处理的batch数量，如果非事务，则理解成第二种。

同时在jstorm中，又将topology.max.spout.pending属性改成别的意义了，如下：

当topology.max.spout.pending 设置不为1时（包括topology.max.spout.pending设置为null），spout内部将额外启动一个线程单独执行ack或fail操作，从而nextTuple在单独一个线程中执行，因此允许在nextTuple中执行block动作，而原生的storm，nextTuple/ack/fail 都在一个线程中执行，当数据量不大时，nextTuple立即返回，而ack、fail同样也容易没有数据，进而导致CPU 大量空转，白白浪费CPU，而在JStorm中， nextTuple可以以block方式获取数据，比如从disruptor中或BlockingQueue中获取数据，当没有数据时，直接block住，节省了大量CPU。

参考文章：

Twitter Storm: Transactional Topolgoy简介

和Storm编程方式区别

转载于:https://my.oschina.net/cjun/blog/372730

weixin_34088598

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
topology.max.spout.pending详解

为什么80%的码农都做不了架构师？>>> ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。