storm源码分析(十二)


2021SC@SDUSC

storm中的trident

trident代码的阅读有两个重要的类TridentTopology、Stream,这两个类可以作为我们学习storm-trident源代码的入口。

trident的拓扑的构造分两部分:

一:构造trident逻辑的拓扑,这部分就是我们调用TridentTopology.newStream(…).each().groupBy()…等的过程中实现。这个过程完成一个逻辑拓扑的构造。保存到一个DefaultDirectedGraph<Node, IndexedEdge> _graph对象中;

二:根据trident逻辑拓扑构造成storm的拓扑,这部分在TridentTopology的build()方法中完成

什么是trident?

Trident是在storm基础上,一个以realtime 计算为目标的高度抽象。 它在提供处理大吞吐量数据能力的同时,也提供了低延时分布式查询和有状态流式处理的能力。Trident是完全容错的,拥有有且只有一次处理的语义,其实就是transactional的高级封装。这就让你可以很轻松 的使用Trident来进行实时数据处理。Trident会把状态以某种形式保持起来,当有错误发生时,它会根据需要来恢复 这些状态。
Trident是完全容错的,拥有有且只有一次处理的语义,其实就是transactional的高级封装。这就让你可以很轻松 的使用Trident来进行实时数据处理。Trident会把状态以某种形式保持起来,当有错误发生时,它会根据需要来恢复 这些状态。

Trident封装了transactional事务类,所以我们不再需要学习Batch相关的基础API了,减轻了学习成本。

trident每次处理消息均以batch为单位,即一次处理多个元组

trident是storm的更高层次抽象,主要提供了3个方面的好处:

(1)常用的count,sum等封装成了方法,可以直接调用不需要自己实现。
(2)提供一次原语,如groupby等。
(3)提供事务支持,可以保证数据均处理且只处理了一次(恰好一次)

TridentTopology

newStream()

    public Stream newStream(String txId, IRichSpout spout) {
        return newStream(txId, new RichSpoutBatchExecutor(spout));
    }
    
    public Stream newStream(String txId, IPartitionedTridentSpout spout) {
        return newStream(txId, new PartitionedTridentSpoutExecutor(spout));
    }
    
    public Stream newStream(String txId, IOpaquePartitionedTridentSpout spout) {
        return newStream(txId, new OpaquePartitionedTridentSpoutExecutor(spout));
    }
 
    public Stream newStream(String txId, ITridentDataSource dataSource) {
        if (dataSource instanceof IBatchSpout) {
            return newStream(txId, (IBatchSpout) dataSource);
        } else if (dataSource instanceof ITridentSpout) {
            return newStream(txId, (ITridentSpout) dataSource);
        } else if (dataSource instanceof IPartitionedTridentSpout) {
            return newStream(txId, (IPartitionedTridentSpout) dataSource);
        } else if (dataSource instanceof IOpaquePartitionedTridentSpout) {
            return newStream(txId, (IOpaquePartitionedTridentSpout) dataSource);
        } else {
            throw new UnsupportedOperationException("Unsupported stream");
        }
    }
 
    public Stream newStream(String txId, IBatchSpout spout) {
        Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
        return addNode(n);
    }
    
    public Stream newStream(String txId, ITridentSpout spout) {
        Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
        return addNode(n);
    }
 
    protected Stream addNode(Node n) {
        registerNode(n);
        return new Stream(this, n.name, n);
    }
 
    protected void registerNode(Node n) {
        _graph.addVertex(n);
        if(n.stateInfo!=null) {
            String id = n.stateInfo.id;
            if(!_colocate.containsKey(id)) {
                _colocate.put(id, new ArrayList());
            }
            _colocate.get(id).add(n);
        }
    }

newStream的第一个参数是txId,第二个参数是ITridentDataSource
ITridentDataSource分为好几个类型,分别有IBatchSpout、ITridentSpout、IPartitionedTridentSpout、IOpaquePartitionedTridentSpout

最后都是创建SpoutNode,然后registerNode添加到_graph(如果node的stateInfo不为null,还会添加到_colocate,不过SpoutNode该值为null),注意SpoutNode的SpoutType为SpoutNode.SpoutType.BATCH。

build()

    public StormTopology build() {
        DefaultDirectedGraph<Node, IndexedEdge> graph = (DefaultDirectedGraph) _graph.clone();
        
        //......
        
        List<SpoutNode> spoutNodes = new ArrayList<>();
        
        // can be regular nodes (static state) or processor nodes
        Set<Node> boltNodes = new LinkedHashSet<>();
        for(Node n: graph.vertexSet()) {
            if(n instanceof SpoutNode) {
                spoutNodes.add((SpoutNode) n);
            } else if(!(n instanceof PartitionNode)) {
                boltNodes.add(n);
            }
        }
        
        Set<Group> initialGroups = new LinkedHashSet<>();
 
        //......
 
        for(Node n: boltNodes) {
            initialGroups.add(new Group(graph, n));
        }
        
        
        GraphGrouper grouper = new GraphGrouper(graph, initialGroups);
        grouper.mergeFully();
        Collection<Group> mergedGroups = grouper.getAllGroups();
        
        
        
        // add identity partitions between groups
        for(IndexedEdge<Node> e: new HashSet<>(graph.edgeSet())) {
            if(!(e.source instanceof PartitionNode) && !(e.target instanceof PartitionNode)) {                
                Group g1 = grouper.nodeGroup(e.source);
                Group g2 = grouper.nodeGroup(e.target);
                // g1 being null means the source is a spout node
                if(g1==null && !(e.source instanceof SpoutNode))
                    throw new RuntimeException("Planner exception: Null source group must indicate a spout node at this phase of planning");
                if(g1==null || !g1.equals(g2)) {
                    graph.removeEdge(e);
                    PartitionNode pNode = makeIdentityPartition(e.source);
                    graph.addVertex(pNode);
                    graph.addEdge(e.source, pNode, new IndexedEdge(e.source, pNode, 0));
                    graph.addEdge(pNode, e.target, new IndexedEdge(pNode, e.target, e.index));                    
                }
            }
        }
        
        //......
        
        // add in spouts as groups so we can get parallelisms
        for(Node n: spoutNodes) {
            grouper.addGroup(new Group(graph, n));
        }
        
        grouper.reindex();
        mergedGroups = grouper.getAllGroups();
                
        
        Map<Node, String> batchGroupMap = new HashMap<>();
        List<Set<Node>> connectedComponents = new ConnectivityInspector<>(graph).connectedSets();
        for(int i=0; i<connectedComponents.size(); i++) {
            String groupId = "bg" + i;
            for(Node n: connectedComponents.get(i)) {
                batchGroupMap.put(n, groupId);
            }
        }
        
        Map<Group, Integer> parallelisms = getGroupParallelisms(graph, grouper, mergedGroups);
 
        TridentTopologyBuilder builder = new TridentTopologyBuilder();
 
        Map<Node, String> spoutIds = genSpoutIds(spoutNodes);
        Map<Group, String> boltIds = genBoltIds(mergedGroups);
 
        for(SpoutNode sn: spoutNodes) {
            Integer parallelism = parallelisms.get(grouper.nodeGroup(sn));
 
            Map<String, Number> spoutRes = new HashMap<>(_resourceDefaults);
            spoutRes.putAll(sn.getResources());
 
            Number onHeap = spoutRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_ONHEAP_MEMORY_MB);
            Number offHeap = spoutRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_OFFHEAP_MEMORY_MB);
            Number cpuLoad = spoutRes.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT);
 
            SpoutDeclarer spoutDeclarer = null;
 
            if(sn.type == SpoutNode.SpoutType.DRPC) {
 
                spoutDeclarer = builder.setBatchPerTupleSpout(spoutIds.get(sn), sn.streamId,
                                                              (IRichSpout) sn.spout, parallelism, batchGroupMap.get(sn));
            } else {
                ITridentSpout s;
                if(sn.spout instanceof IBatchSpout) {
                    s = new BatchSpoutExecutor((IBatchSpout)sn.spout);
                } else if(sn.spout instanceof ITridentSpout) {
                    s = (ITridentSpout) sn.spout;
                } else {
                    throw new RuntimeException("Regular rich spouts not supported yet... try wrapping in a RichSpoutBatchExecutor");
                    // TODO: handle regular rich spout without batches (need lots of updates to support this throughout)
                }
                spoutDeclarer = builder.setSpout(spoutIds.get(sn), sn.streamId, sn.txId, s, parallelism, batchGroupMap.get(sn));
            }
 
            if(onHeap != null) {
                if(offHeap != null) {
                    spoutDeclarer.setMemoryLoad(onHeap, offHeap);
                }
                else {
                    spoutDeclarer.setMemoryLoad(onHeap);
                }
            }
 
            if(cpuLoad != null) {
                spoutDeclarer.setCPULoad(cpuLoad);
            }
        }
 
        for(Group g: mergedGroups) {
            if(!isSpoutGroup(g)) {
                Integer p = parallelisms.get(g);
                Map<String, String> streamToGroup = getOutputStreamBatchGroups(g, batchGroupMap);
                Map<String, Number> groupRes = g.getResources(_resourceDefaults);
 
                Number onHeap = groupRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_ONHEAP_MEMORY_MB);
                Number offHeap = groupRes.get(Config.TOPOLOGY_COMPONENT_RESOURCES_OFFHEAP_MEMORY_MB);
                Number cpuLoad = groupRes.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT);
 
                BoltDeclarer d = builder.setBolt(boltIds.get(g), new SubtopologyBolt(graph, g.nodes, batchGroupMap), p,
                                                 committerBatches(g, batchGroupMap), streamToGroup);
 
                if(onHeap != null) {
                    if(offHeap != null) {
                        d.setMemoryLoad(onHeap, offHeap);
                    }
                    else {
                        d.setMemoryLoad(onHeap);
                    }
                }
 
                if(cpuLoad != null) {
                    d.setCPULoad(cpuLoad);
                }
 
                Collection<PartitionNode> inputs = uniquedSubscriptions(externalGroupInputs(g));
                for(PartitionNode n: inputs) {
                    Node parent = TridentUtils.getParent(graph, n);
                    String componentId = parent instanceof SpoutNode ?
                            spoutIds.get(parent) : boltIds.get(grouper.nodeGroup(parent));
                    d.grouping(new GlobalStreamId(componentId, n.streamId), n.thriftGrouping);
                }
            }
        }
        HashMap<String, Number> combinedMasterCoordResources = new HashMap<String, Number>(_resourceDefaults);
        combinedMasterCoordResources.putAll(_masterCoordResources);
        return builder.buildTopology(combinedMasterCoordResources);
    }

这里创建了TridentTopologyBuilder,然后对于spoutNodes,调用TridentTopologyBuilder.setSpout(String id, String streamName, String txStateId, ITridentSpout spout, Integer parallelism, String batchGroup)方法,添加spout
对于IBatchSpout类型的spout,通过BatchSpoutExecutor包装为ITridentSpout。

这里的streamName为streamId,通过UniqueIdGen.getUniqueStreamId生成,以s开头,之后是_streamCounter的计数,比如1,合起来就是s1;txStateId为用户传入的txId;batchGroup以bg开头,之后是connectedComponents的元素的index,比如0,合起来就是bg0;parallelism参数就是用户构建topology时设置的。

设置完spout之后,就是设置spout的相关资源配置,比如memoryLoad、cpuLoad;之后设置bolt,这里使用的是SubtopologyBolt,然后设置bolt相关的资源配置
最后调用TridentTopologyBuilder.buildTopology。

参考链接:https://blog.csdn.net/weixin_45366499/article/details/112008176

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值