storm trident api

Trident API

 

 

partition本地操作,无需网络io

等同于pig的generate

mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))

 

public class MyFunction extends BaseFunction {

    public void execute(TridentTuple tuple, TridentCollector collector) {

        for(int i=0; i < tuple.getInteger(0); i++) {

            collector.emit(new Values(i));

        }

    }

}

 

 

等同于pig的filter

mystream.each(new Fields("b", "a"), new MyFilter())

 

public class MyFilter extends BaseFilter {

    public boolean isKeep(TridentTuple tuple) {

        return tuple.getInteger(0) == 1 && tuple.getInteger(1) == 2;

    }

}

 

 

partitionAggregate

 

等同于pig的combine操作(三种aggregate接口)

mystream.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

mystream.chainedAgg()

        .partitionAggregate(new Count(), new Fields("count"))

        .partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

        .chainEnd()

 

 

@@@

public class Count implements CombinerAggregator<Long> {

    public Long init(TridentTuple tuple) {

        return 1L;

    }

 

    public Long combine(Long val1, Long val2) {

        return val1 + val2;

    }

 

    public Long zero() {

        return 0L;

    }

}

 

@@@

public class Count implements ReducerAggregator<Long> {

    public Long init() {

        return 0L;

    }

 

    public Long reduce(Long curr, TridentTuple tuple) {

        return curr + 1;

    }

}

 

//最底层的aggregate,每个方法都有collector

public class CountAgg extends BaseAggregator<CountState> {

    static class CountState {

        long count = 0;

    }

 

    public CountState init(Object batchId, TridentCollector collector) {

        return new CountState();

    }

 

    public void aggregate(CountState state, TridentTuple tuple, TridentCollector collector) {

        state.count+=1;

    }

 

    public void complete(CountState state, TridentCollector collector) {

        collector.emit(new Values(state.count));

    }

}

 

---------------------

 

stateQuery and partitionPersist

 

--------------------------

 

projection

mystream.project(new Fields("b", "d"))

 

---------------------------

 

Repartitioning operations

 

shuffle: Use random round robin algorithm to evenly redistribute tuples across all target partitions

broadcast: Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do a stateQuery on every partition of data.

partitionBy: partitionBy takes in a set of fields and does semantic partitioning based on that set of fields. The fields are hashed and modded by the number of target partitions to select the target partition. partitionBy guarantees that the same set of fields always goes to the same target partition.

global: All tuples are sent to the same partition. The same partition is chosen for all batches in the stream.

batchGlobal: All tuples in the batch are sent to the same partition. Different batches in the stream may go to different partitions.

partition: This method takes in a custom partitioning function that implements backtype.storm.grouping.CustomStreamGrouping

 

----------------------------

 

Aggregation operations

 

mystream.aggregate(new Count(), new Fields("count"))

 

----------------------------

 

等同pig group by

Operations on grouped streams

 

groupBy(new Fields("word"))

 

--------------------------------

 

不同于sql的joins,做的是一个batch的join

Merges and joins

 

Here's an example join between a stream containing fields ["key", "val1", "val2"] and another stream containing ["x", "val1"]:

 

topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值