storm trident api

最新推荐文章于 2023-08-15 15:32:29 发布

blackproof

最新推荐文章于 2023-08-15 15:32:29 发布

阅读量81

点赞数

分类专栏： storm 文章标签： storm trident api

本文链接：https://blog.csdn.net/blackproof/article/details/84725536

版权

storm 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Trident API

partition本地操作，无需网络io

等同于pig的generate

mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))

public class MyFunction extends BaseFunction {

public void execute(TridentTuple tuple, TridentCollector collector) {

for(int i=0; i < tuple.getInteger(0); i++) {

collector.emit(new Values(i));

}

等同于pig的filter

mystream.each(new Fields("b", "a"), new MyFilter())

public class MyFilter extends BaseFilter {

public boolean isKeep(TridentTuple tuple) {

return tuple.getInteger(0) == 1 && tuple.getInteger(1) == 2;

}

partitionAggregate

等同于pig的combine操作（三种aggregate接口）

mystream.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

mystream.chainedAgg()

.partitionAggregate(new Count(), new Fields("count"))

.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

.chainEnd()

@@@

public class Count implements CombinerAggregator<Long> {

public Long init(TridentTuple tuple) {

return 1L;

}

public Long combine(Long val1, Long val2) {

return val1 + val2;

}

public Long zero() {

return 0L;

}

@@@

public class Count implements ReducerAggregator<Long> {

public Long init() {

return 0L;

}

public Long reduce(Long curr, TridentTuple tuple) {

return curr + 1;

}

//最底层的aggregate，每个方法都有collector

public class CountAgg extends BaseAggregator<CountState> {

static class CountState {

long count = 0;

}

public CountState init(Object batchId, TridentCollector collector) {

return new CountState();

}

public void aggregate(CountState state, TridentTuple tuple, TridentCollector collector) {

state.count+=1;

}

public void complete(CountState state, TridentCollector collector) {

collector.emit(new Values(state.count));

}

---------------------

stateQuery and partitionPersist

--------------------------

projection

mystream.project(new Fields("b", "d"))

---------------------------

Repartitioning operations

shuffle: Use random round robin algorithm to evenly redistribute tuples across all target partitions

broadcast: Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do a stateQuery on every partition of data.

partitionBy: partitionBy takes in a set of fields and does semantic partitioning based on that set of fields. The fields are hashed and modded by the number of target partitions to select the target partition. partitionBy guarantees that the same set of fields always goes to the same target partition.

global: All tuples are sent to the same partition. The same partition is chosen for all batches in the stream.

batchGlobal: All tuples in the batch are sent to the same partition. Different batches in the stream may go to different partitions.

partition: This method takes in a custom partitioning function that implements backtype.storm.grouping.CustomStreamGrouping

----------------------------

Aggregation operations

mystream.aggregate(new Count(), new Fields("count"))

----------------------------

等同pig group by

Operations on grouped streams

groupBy(new Fields("word"))

--------------------------------

不同于sql的joins，做的是一个batch的join

Merges and joins

Here's an example join between a stream containing fields ["key", "val1", "val2"] and another stream containing ["x", "val1"]:

topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));

blackproof

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
storm trident api

Trident API partition本地操作，无需网络io等同于pig的generatemystream.each(new Fields("b"), new MyFunction(), new Fields("d"))) public class MyFunction extends BaseFunction { public void execut...
复制链接

扫一扫