STORM入门之(TridentState以及聚合函数细节描述)

(一)Aggregator函数是batch聚合,最好与groupBy分组联合使用,意思为根据具体的fields进行聚合,如果是分词那么就是根据具体的单词进行聚合,相同的单词聚合到一起,聚合并发单词的数量使用parallelismHint才可生效,否则永远都进行全聚合

生效方式

.partitionBy(new Fields("word")) //分区
.each(new Fields("word"),new Filter1()).parallelismHint(2) //几分区
.groupBy(new Fields("word"))
.aggregate(new Fields("word"),new Agg1(),new Fields("aggr1"))
.parallelismHint(2); //根据具体字段分两组

三种聚合接口在BaseAggregator结果展示

===============Aggregator==============
[0]the
[0]cow
[0]the
[1]man
[partitionId1]119 man
man119 end
[partitionId0]123 the
[partitionId0]123 cow
[partitionId0]123 the
thecowthe123 end
=========CombinerAggregator============
[0]the
[1]man
[0]cow
[0]the
combine:man
combine:the
combine:cow
combine:thethe //先局部

combine:man
combine:thethe
combine:cow  //再汇总
=========ReducerAggregator=============
[1]man
[0]the
[0]cow
[0]the
str:man
str:the
str:cow
str:thethe
=======================================

(二)patitionAggregate为分区聚合函数,无需使用groupBy,后面无需使用parallelismHint,只要前方有分区即可按照分区聚合

生效方式

.partitionBy(new Fields("word")) //.分区
.each(new Fields("word"),new Filter1()).parallelismHint(2)//几分区
.partitionAggregate(new Fields("word"),new Agg1(),new Fields("aggr1")); //分区聚合

三种聚合接口在patitionAggregate结果展示

===============Aggregator==============
[1]man
[partitionId1]127 man
man127 end

[0]cow
[0]the
[0]the
[partitionId0]131 cow
[partitionId0]131 the
[partitionId0]131 the
thecowthe131 end
=========CombinerAggregator============
[1]man
combine:man

[0]the
[0]cow
[0]the
combine:the
combine:thecow
combine:thecowthe
=========ReducerAggregator=============
[1]man
str:man
[0]the
[0]cow
[0]the
str:the
str:thecow
str:thecowthe
=======================================

 

(三)持久聚合与分区持久化,为流聚合,持久数据用

(1)StateQuery

此方法用于state状态修改,可与数据库进行交互,实现步骤如下

1.创建State

import java.util.List;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.tuple.TridentTuple;

public class TestState implements State{

    @Override
    public void beginCommit(Long arg0) {
        // TODO Auto-generated method stub
        
    }

    @Override
    public void commit(Long arg0) {
        // TODO Auto-generated method stub
        
    }
    
    
    public String getDBOption(int i){
        return "success"+i;
    }

}

2.创建StateFactory

import java.util.Map;
import org.apache.storm.task.IMetricsContext;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.state.StateFactory;

public class TestStateFactory implements StateFactory{

    @Override
    public State makeState(Map arg0, IMetricsContext arg1, int arg2, int arg3) {
        // TODO Auto-generated method stub
        return new TestState();
    }

}

3.创建函数

import java.util.ArrayList;
import java.util.List;

import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.state.BaseQueryFunction;
import org.apache.storm.trident.tuple.TridentTuple;

public class TestQueryLocation extends BaseQueryFunction<TestState, String>{

    @Override
    public List<String> batchRetrieve(TestState state, List<TridentTuple> arg1) {
        List<String> list = new ArrayList<String>();
        for(int i = 0 ; i< arg1.size() ; i++){
            list.add(state.getDBOption(i));
        }
        
        return list;
    }

    @Override
    public void execute(TridentTuple arg0, String arg1, TridentCollector arg2) {
        System.out.println(arg0.getString(0));
        System.out.println(arg1);
        
    }

}

4.传入stateQuery方法进行

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;

public class TestStateToplogy {
    public static void main(String agrs[]){
        @SuppressWarnings("unchecked")
        FixedBatchSpout spout = new FixedBatchSpout(
            new Fields("sentence"), 2,
            new Values("the cow"),
            new Values("the man"),
            new Values("four score"),
            new Values("many apples"));
        spout.setCycle(false);
        TridentTopology topology = new TridentTopology(); 
        topology.newStream("spout",spout)
        .each(new Fields("sentence"), new Split(), new Fields("word"))
        .stateQuery(topology.newStaticState(new TestStateFactory()),new Fields("word"), new TestQueryLocation(), new Fields("test"));
        
        
        StormTopology stormTopology = topology.build();
        LocalCluster cluster = new LocalCluster();
        Config conf = new Config();
        conf.setDebug(false);
        cluster.submitTopology("test", conf,stormTopology);
    }
    
   public static class Split extends BaseFunction {
        
        public void execute(TridentTuple tuple, TridentCollector collector) {
            String sentence = tuple.getString(0);
            for(String word: sentence.split(" ")) {
                collector.emit(new Values(word));                
            }
        }
     }
}

5.输出结果

the
success0
cow
success1
the
success2
man
success3
four
success0
score
success1
many
success2
apples
success3

(2)StateUpdater

场景应用,入库操作,其他场景还没考虑到。。

1.准备StateFactory,State完毕后,准备StateUpdater

import java.util.List;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.state.BaseStateUpdater;
import org.apache.storm.trident.tuple.TridentTuple;

public class TestLocationUpdater extends BaseStateUpdater<TestState>{

    @Override
    public void updateState(TestState state, List<TridentTuple> arg1,
        TridentCollector arg2) {
        state.getBatch(arg1);
        
    }

}

2.State

import java.util.List;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.tuple.TridentTuple;

public class TestState implements State{

    @Override
    public void beginCommit(Long arg0) {
        System.out.println("beginCommit");
        
    }

    @Override
    public void commit(Long arg0) {
        System.out.println("commit");
        
    }
    
    
    public String getDBOption(int i){
        return "success"+i;
    }
    
    public void getBatch(List<TridentTuple> arg1){
        for(int i = 0; i< arg1.size() ; i++){
            System.out.println(arg1.get(i).getString(0));
        }
        System.out.println("insert batch over");

    }

}

3.Topology调用方式

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.trident.TridentState;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;

public class TestStateToplogy {
    public static void main(String agrs[]){
        @SuppressWarnings("unchecked")
        FixedBatchSpout spout = new FixedBatchSpout(
            new Fields("sentence"), 2,
            new Values("the cow"),
            new Values("the man"),
            new Values("four score"),
            new Values("many apples"));
        spout.setCycle(false);
        
        TridentTopology topology = new TridentTopology(); 
        topology.newStream("spout",spout)
        .each(new Fields("sentence"), new Split(), new Fields("word"))
        .stateQuery(topology.newStaticState(new TestStateFactory()),new Fields("word"), new TestQueryLocation(), new Fields("test"))
        .parallelismHint(2)
        .partitionPersist(new TestStateFactory(), new Fields("test"), new TestLocationUpdater());

        
        
        StormTopology stormTopology = topology.build();
        LocalCluster cluster = new LocalCluster();
        Config conf = new Config();
        conf.setDebug(false);
        cluster.submitTopology("test", conf,stormTopology);
    }
    
   public static class Split extends BaseFunction {
        
        public void execute(TridentTuple tuple, TridentCollector collector) {
            String sentence = tuple.getString(0);
            for(String word: sentence.split(" ")) {
                collector.emit(new Values(word));                
            }
        }
     }
}

4.结果

the
success0
cow
success1
the
success2
man
success3
beginCommit
the
cow
the
man
insert batch over
commit
four
success0
score
success1
many
success2
apples
success3
beginCommit
four
score
many
apples
insert batch over
commit

(3)patitionPersist分区持久化

待续

(4)persistAggregate持久化聚合

待续

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值