Strom和Trident

最新推荐文章于 2021-02-04 21:35:25 发布

define_us

最新推荐文章于 2021-02-04 21:35:25 发布

阅读量253

点赞数

本文链接：https://blog.csdn.net/define_us/article/details/83413649

版权

Trident是Storm 0.8.0版本引入的新特性。在者之前，storm有Transactional Topologies（ 0.7.0 ）的概念，trident之后，这个概念就被deprecated的了。
首先，将大吞吐量数据转化为一个个batch。
在这里插入图片描述

trident topo和普通storm topo的对比如下

在这里插入图片描述

单词计数和查询显示例子

package org.ljh.tridentdemo;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.LocalDRPC;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import storm.trident.TridentState;
import storm.trident.TridentTopology;
import storm.trident.operation.BaseFunction;
import storm.trident.operation.TridentCollector;
import storm.trident.operation.builtin.Count;
import storm.trident.operation.builtin.FilterNull;
import storm.trident.operation.builtin.MapGet;
import storm.trident.operation.builtin.Sum;
import storm.trident.testing.FixedBatchSpout;
import storm.trident.testing.MemoryMapState;
import storm.trident.tuple.TridentTuple;


public class TridentWordCount {
    public static class Split extends BaseFunction {
        @Override
        public void execute(TridentTuple tuple, TridentCollector collector) {
            String sentence = tuple.getString(0);
            for (String word : sentence.split(" ")) {
                collector.emit(new Values(word));
            }
        }
    }

    public static StormTopology buildTopology(LocalDRPC drpc) {
        FixedBatchSpout spout =
                new FixedBatchSpout(new Fields("sentence"), 3, new Values(
                        "the cow jumped over the moon"), new Values(
                        "the man went to the store and bought some candy"), new Values(
                        "four score and seven years ago"),
                        new Values("how many apples can you eat"), new Values(
                                "to be or not to be the person"));
        spout.setCycle(true);

        //创建拓扑对象
        TridentTopology topology = new TridentTopology();
        
        //这个流程用于统计单词数据。结果将被保存在wordCounts中
        TridentState wordCounts =
                topology.newStream("spout1", spout)
                        .parallelismHint(16)
                        .each(new Fields("sentence"), new Split(), new Fields("word"))
                        .groupBy(new Fields("word"))
                        .persistentAggregate(new MemoryMapState.Factory(), new Count(),
                                new Fields("count")).parallelismHint(16);
        //这个流程用于查询上面的统计结果
        topology.newDRPCStream("words", drpc)
                .each(new Fields("args"), new Split(), new Fields("word"))
                .groupBy(new Fields("word"))
                .stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
                .each(new Fields("count"), new FilterNull())
               .aggregate(new Fields("count"), new Sum(), new Fields("sum"));
        return topology.build();
    }

    public static void main(String[] args) throws Exception {
        Config conf = new Config();
        conf.setMaxSpoutPending(20);
        if (args.length == 0) {
            LocalDRPC drpc = new LocalDRPC();
            LocalCluster cluster = new LocalCluster();
            cluster.submitTopology("wordCounter", conf, buildTopology(drpc));
            for (int i = 0; i < 100; i++) {
                System.out.println("DRPC RESULT: " + drpc.execute("words", "cat the dog jumped"));
                Thread.sleep(1000);
            }
        } else {
            conf.setNumWorkers(3);
            StormSubmitter.submitTopologyWithProgressBar(args[0], conf, buildTopology(null));
        }
    }
}

State

下图表示可以实现正好一次语义搭配。
在这里插入图片描述

Spout

一个transactional spout会有如下这些特性：

有着同样txid的batch一定是一样的。当重播一个txid对应的batch时，一定会重播和之前对应txid的batch中同样的tuples。
各个batch之间是没有交集的。每个tuple只能属于一个batch
每一个tuple都属于一个batch，无一例外

说明Transactional spouts的语义，假设数据库当前记录如下：

man => [count=3, txid=1] dog => [count=4, txid=3] apple => [count=10, txid=2]

此时txid=3的事务携带以下Tuple：[“man”][“man”][“dog”]，那么由于man当前txid与数据库中txid不一致，man将会被更新count；而dog由于txid一致，则不会被更新，最终数据库存储结果将如下：

man => [count=5, txid=3] dog => [count=4, txid=3] apple => [count=10, txid=2]

这个设计的缺陷时要求数据源必须有能力保证在系统可以根据指定的txid取得同一批的数据，那么，你就没法满足transactional spout第一个基本要求，transactionalspout就无能为力了。

一个Opaque transactional spouts的特性是
每个tuple只在一个batch中被成功处理。然而，一个tuple在一个batch中被处理失败后，有可能会在另外的一个batch中被成功处理。
使用opaque transactional state存储时，库中除了存储value和txid以外，还会存preValue(上次处理后的值)，以更新单词man为例，如设man在库中
已经存储如下：