storm自定义grouping


storm有很多种grouping方案

storm wiki上面对各种grouping的说明如下:

Stream groupings

Part of defining a topology is specifying for each bolt which streams it should receive as input. A stream grouping defines how that stream should be partitioned among the bolt's tasks.

There are seven built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing theCustomStreamGrouping interface:

  1. Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
  2. Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
  3. All grouping: The stream is replicated across all the bolt's tasks. Use this grouping with care.
  4. Global grouping: The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
  5. None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).
  6. Direct grouping: This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext or by keeping track of the output of the emit method in OutputCollector (which returns the task ids that the tuple was sent to).
  7. Local or shuffle grouping: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.

Resources:

  • TopologyBuilder: use this class to define topologies
  • InputDeclarer: this object is returned whenever setBolt is called on TopologyBuilder and is used for declaring a bolt's input streams and how those streams should be grouped
  • CoordinatedBolt: this bolt is useful for distributed RPC topologies and makes heavy use of direct streams and direct groupings

我们现在业务中遇到一个问题想让用户的uid按照分段的规则grouping到对应的task上面,于是采用uid%k的方法将相同模值的记录在一个task进行业务处理,自己实现了ModStreamingGrouping,代码如下:

package storm.starter;

import java.util.Arrays;
import java.util.List;
import java.util.Map;

import backtype.storm.grouping.CustomStreamGrouping;
import backtype.storm.task.TopologyContext;
import backtype.storm.tuple.Fields;

public class ModStreamGrouping implements CustomStreamGrouping {
	
	private Map _map;
	private TopologyContext _ctx;
	private Fields _fields;
	private List<Integer> _targetTasks;
	
	public ModStreamGrouping(){
		
	}
	
	@Override
	public void prepare(TopologyContext context, Fields outFields,
			List<Integer> targetTasks) {
		// TODO Auto-generated method stub
		_ctx = context;
		_fields = outFields;
		_targetTasks = targetTasks;
	}

	@Override
	public List<Integer> chooseTasks(List<Object> values) {
		// TODO Auto-generated method stub
		Long groupingKey = Long.valueOf( values.get(0).toString());
		int index = (int) (groupingKey%(_targetTasks.size()));
		return Arrays.asList(_targetTasks.get(index));
	}

}

测试代码:

package storm.starter;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Set;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class ModGroupingTest {
	public static class TestUidSpout extends BaseRichSpout {
	    boolean _isDistributed;
	    SpoutOutputCollector _collector;
	        
	    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
	        _collector = collector;
	    }
	    
	    public void close() {
	        
	    }
	        
	    public void nextTuple() {
	        Utils.sleep(100);
	       
	        final Random rand = new Random();
	        final int uid =rand.nextInt(100000000);
	        
	        _collector.emit(new Values(uid));
	        
	    }
	    
	    public void ack(Object msgId) {

	    }

	    public void fail(Object msgId) {
	        
	    }
	    
	    public void declareOutputFields(OutputFieldsDeclarer declarer) {
	    	declarer.declare(new Fields("uid"));
	    }

   
	}
	
	public static class modGroupBolt extends BaseRichBolt {
        OutputCollector _collector;
        String _ComponentId;
        int _TaskId;
        @Override
        public void prepare(Map conf, TopologyContext context, OutputCollector collector) {

        	_collector = collector;
        	_ComponentId = context.getThisComponentId();
        	_TaskId = context.getThisTaskId();
        }

        @Override
        public void execute(Tuple tuple) {
//            _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
        	System.out.println(_ComponentId+":"+_TaskId +"recevie :" + tuple.getInteger(0));
            
        	_collector.emit(new Values(tuple));
            _collector.ack(tuple);
            
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("uid"));
        }


    }
	
	public static void main(String args[]){
		TopologyBuilder builder = new TopologyBuilder();
		builder.setSpout("uid", new TestUidSpout());
		builder.setBolt("process", new modGroupBolt(), 10).customGrouping("uid", new ModStreamGrouping());
		
		Config config = new Config();
		config.setDebug(true);
		
		config.setNumWorkers(3);
		LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("test", config, builder.createTopology());
//        Utils.sleep(30000);
//        cluster.killTopology("test");
//        cluster.shutdown();    
	}
}



  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值