storm_nickta的博客-CSDN博客

storm

关注

关注数：文章数：13 文章阅读量：4664 文章收藏量：4

作者: nickta

做一个平凡而不平淡的人

展开

【译文】理解storm拓扑并行度

原文地址： http://storm.apache.org/releases/1.2.1/Understanding-the-parallelism-of-a-Storm-topology.html什么构成一个运行的拓扑：工作进程，执行器和任务storm区分以下三个用于在Storm集群中实际运行拓扑的主要实体：1. 工作进程2. 执行器（线程）3. 任务这是他们的关系的一个简单的说明【译者理解：1...

翻译 2018-03-09 18:19:29 · 329 阅读 · 0 评论
Storm Trident示例groupBy

groupBy不包括任何的重新分区，它把输入流转换为按组的输入流，加入了groupBy，则后续的聚合aggregate，则是按照组进行。1. groupBy可以放在partitionAggregate前面。此时partitionAggregate的作用是对分区内数据做分组聚合。2. groupBy可以放在aggregate前面。此时同一批次中的所有tuple会分配到一个单独partition当中，...

原创 2018-03-24 21:50:07 · 550 阅读 · 0 评论
Storm Trident示例CombinerAggregator

CombinerAggregator首先在每个分区上运行partitionAggregate，在每个partition内先聚合，然后运行全局重新分区(global)操作以合并同一批次的所有分区到一个单独的分区，即把前面每个partition聚合的结果，再放到一个单独的partition进行聚合。这里的网络传输与其他两个聚合器相比较少。因此，CombinerAggregator的总体性能比Agg...

原创 2018-03-24 21:17:50 · 410 阅读 · 0 评论
Storm Trident示例Aggregator

Aggregator首先在输入流上运行全局重新分区操作(global)将同一批次的所有分区合并到一个分区中，然后在每个批次上运行的聚合功能，针对Batch操作。与ReduceAggregator很相似。省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918static class State { int c...

原创 2018-03-24 20:52:18 · 425 阅读 · 0 评论
Storm Trident示例ReducerAggregator

ReducerAggregator首先在输入流上运行全局重新分区操作(global)将同一批次的所有分区合并到一个分区中，然后在每个批次上运行的聚合功能，针对Batch操作。省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout spout = new FixedBatchSpout(n...

原创 2018-03-24 14:17:34 · 267 阅读 · 0 评论
Storm Trident示例function, filter, projection

以下代码演示function, filter, projection的使用，可结合注释省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout spout = new FixedBatchSpout(new Fields("user", "score"), 3, ...

原创 2018-03-24 13:49:02 · 228 阅读 · 1 评论
Storm Trident示例partitionAggregate

partitionAggregate是针对于每个partition，而不是每个batch,对每个partition当中的tuple做聚合省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout spout = new FixedBatchSpout(new Fields("user", "...

原创 2018-03-23 18:27:33 · 392 阅读 · 0 评论
Storm Trident示例batchGlobal

batchGlobal把同属于一个batch的tuples分配到相同的partition当中。省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout spout = new FixedBatchSpout(new Fields("user", "score"), 3, ...

原创 2018-03-23 17:43:44 · 186 阅读 · 0 评论
Storm Trident示例broadcast

下代码使用broadcast做repartition, 广播，会把tuples分配到所有的partitions当中, 如果有5个partition，则会把原tuples复制5份，分配到5个partition去省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout

原创 2018-03-23 16:07:56 · 180 阅读 · 0 评论
Storm Trident示例global

如下代码使用global做repartition, 数据流中的所有tuple都被分配到同一个partition当中(partition id最小的那个)，省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918FixedBatchSpout spout = new FixedBatchSpout(new

原创 2018-03-23 15:46:43 · 141 阅读 · 0 评论
Storm Trident示例partitionBy

如下代码使用partitionBy做repartition, partitionBy即根据相应字段的值按一定算法，把tuple分配到目标partition当中（Target Partition = hash(fields) % (number of target partition)），相同值会被分配到同一个partition当中，由于不同值有可能出现相同的hash, 根据上面的算法，不同的值

原创 2018-03-23 15:33:54 · 306 阅读 · 0 评论
Storm Trident示例shuffle&parallelismHint

本例包括Storm Trident中shuffle与parallelismHint的使用。代码当中包括注释maven<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> &lt...

原创 2018-03-23 14:21:35 · 785 阅读 · 0 评论
storm组件（架构层面）

Strom集群遵循从主模式，主与从之间通过Zookeeper协作。架构层面上包括三个组件：1） Nimbus Node2）Supervisor Nodes3）Zookeeper 其中Nimbus Node是Storm集群中master，负责分发任务，监控集群状态，重启应用。Supervisor Nodes在Storm集群中负责执行Nimbus分发给它的任务。Nimbus与Supervisor通过...

原创 2018-03-09 18:22:43 · 465 阅读 · 0 评论

storm

作者: nickta

【译文】理解storm拓扑并行度

Storm Trident示例groupBy

Storm Trident示例CombinerAggregator

Storm Trident示例Aggregator

Storm Trident示例ReducerAggregator

Storm Trident示例function, filter, projection

Storm Trident示例partitionAggregate

Storm Trident示例batchGlobal

Storm Trident示例broadcast

Storm Trident示例global

Storm Trident示例partitionBy

Storm Trident示例shuffle&amp;parallelismHint

storm组件（架构层面）

Storm Trident示例shuffle&parallelismHint