Storm 1.0.2 - 单词计数案例学习

转自: http://www.cnblogs.com/jonyo/p/5861171.html


单词计数拓扑WordCountTopology实现的基本功能就是不停地读入一个个句子,最后输出每个单词和数目并在终端不断的更新结果,拓扑的数据流如下:

   id="iframe_0.24710371619761462" src="data:text/html;charset=utf8,%3Cimg%20id=%22img%22%20src=%22http://wiki.sankuai.com/download/attachments/597748427/WordCountTopology.png?version=1&modificationDate=1473561297000&api=v2&_=5861171%22%20style=%22border:none;max-width:1028px%22%3E%3Cscript%3Ewindow.onload%20=%20function%20()%20%7Bvar%20img%20=%20document.getElementById('img');%20window.parent.postMessage(%7BiframeId:'iframe_0.24710371619761462',width:img.width,height:img.height%7D,%20'http://www.cnblogs.com');%7D%3C/script%3E" frameborder="0" scrolling="no" style="margin: 0px; padding: 0px; border-width: initial; border-style: none; width: 20px; height: 20px;">

  • 语句输入Spout:  从数据源不停地读入数据,并生成一个个句子,输出的tuple格式:{"sentence":"hello world"}
  • 语句分割Bolt: 将一个句子分割成一个个单词,输出的tuple格式:{"word":"hello"}  {"word":"world"}
  • 单词计数Bolt: 保存每个单词出现的次数,每接到上游一个tuple后,将对应的单词加1,并将该单词和次数发送到下游去,输出的tuple格式:{"hello":"1"}  {"world":"3"}
  • 结果上报Bolt: 维护一份所有单词计数表,每接到上游一个tuple后,更新表中的计数数据,并在终端将结果打印出来。

  开发步骤:

    1.环境

  • 操作系统:mac os 10.10.3
  • JDK: jdk1.8.0_40
  • IDE: intellij idea 15.0.3
  • Maven: apache-maven-3.0.3

  2.项目搭建

  • 在idea新建一个maven项目工程:storm-learning
  • 修改pom.xml文件,加入strom核心的依赖,配置slf4j依赖,方便Log输出
复制代码
<dependencies>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.6.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>1.0.2</version>
        </dependency>
</dependencies>
复制代码

 3. Spout和Bolt组件的开发

  • SentenceSpout
  • SplitSentenceBolt
  • WordCountBolt
  • ReportBolt

SentenceSpout.java

复制代码
 1 public class SentenceSpout extends BaseRichSpout{
 2 
 3     private SpoutOutputCollector spoutOutputCollector;
 4 
 5     //为了简单,定义一个静态数据模拟不断的数据流产生
 6     private static final String[] sentences={
 7             "The logic for a realtime application is packaged into a Storm topology",
 8             "A Storm topology is analogous to a MapReduce job",
 9             "One key difference is that a MapReduce job eventually finishes whereas a topology runs forever",
10             " A topology is a graph of spouts and bolts that are connected with stream groupings"
11     };
12 
13     private int index=0;
14 
15     //初始化操作
16     public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
17         this.spoutOutputCollector = spoutOutputCollector;
18     }
19 
20     //核心逻辑
21     public void nextTuple() {
22         spoutOutputCollector.emit(new Values(sentences[index]));
23         ++index;
24         if(index>=sentences.length){
25             index=0;
26         }
27     }
28 
29     //向下游输出
30     public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
31         outputFieldsDeclarer.declare(new Fields("sentences"));
32     }
33 }
复制代码

SplitSentenceBolt.java

复制代码
 1 public class SplitSentenceBolt extends BaseRichBolt{
 2 
 3     private OutputCollector outputCollector;
 4 
 5     public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
 6         this.outputCollector = outputCollector;
 7     }
 8 
 9     public void execute(Tuple tuple) {
10         String sentence = tuple.getStringByField("sentences");
11         String[] words = sentence.split(" ");
12         for(String word : words){
13             outputCollector.emit(new Values(word));
14         }
15     }
16 
17     public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
18         outputFieldsDeclarer.declare(new Fields("word"));
19     }
20 }
复制代码

WordCountBolt.java

复制代码
 1 public class WordCountBolt extends BaseRichBolt{
 2 
 3     //保存单词计数
 4     private Map<String,Long> wordCount = null;
 5 
 6     private OutputCollector outputCollector;
 7 
 8     public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
 9         this.outputCollector = outputCollector;
10         wordCount = new HashMap<String, Long>();
11     }
12 
13     public void execute(Tuple tuple) {
14         String word = tuple.getStringByField("word");
15         Long count = wordCount.get(word);
16         if(count == null){
17             count = 0L;
18         }
19         ++count;
20         wordCount.put(word,count);
21         outputCollector.emit(new Values(word,count));
22     }
23 
24 
25     public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
26         outputFieldsDeclarer.declare(new Fields("word","count"));
27     }
28 }
复制代码

ReportBolt.java

复制代码
 1 public class ReportBolt extends BaseRichBolt {
 2     
 3     private static final Logger log = LoggerFactory.getLogger(ReportBolt.class);
 4 
 5     private Map<String, Long> counts = null;
 6 
 7     public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
 8         counts = new HashMap<String, Long>();
 9     }
10 
11     public void execute(Tuple tuple) {
12         String word = tuple.getStringByField("word");
13         Long count = tuple.getLongByField("count");
14         counts.put(word, count);
15         //打印更新后的结果
16         printReport();
17     }
18 
19     public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
20         //无下游输出,不需要代码
21     }
22 
23     //主要用于将结果打印出来,便于观察
24     private void printReport(){
25         log.info("--------------------------begin-------------------");
26         Set<String> words = counts.keySet();
27         for(String word : words){
28             log.info("@report-bolt@: " + word + " ---> " + counts.get(word));
29         }
30         log.info("--------------------------end---------------------");
31     }
32 }
复制代码

 4.拓扑配置

  • WordCountTopology
复制代码
 1 public class WordCountTopology {
 2 
 3     private static final Logger log = LoggerFactory.getLogger(WordCountTopology.class);
 4 
 5     //各个组件名字的唯一标识
 6     private final static String SENTENCE_SPOUT_ID = "sentence-spout";
 7     private final static String SPLIT_SENTENCE_BOLT_ID = "split-bolt";
 8     private final static String WORD_COUNT_BOLT_ID = "count-bolt";
 9     private final static String REPORT_BOLT_ID = "report-bolt";
10 
11     //拓扑名称
12     private final static String TOPOLOGY_NAME = "word-count-topology";
13 
14     public static void main(String[] args) {
15 
16         log.info(".........begining.......");
17         //各个组件的实例
18         SentenceSpout sentenceSpout = new SentenceSpout();
19         SplitSentenceBolt splitSentenceBolt = new SplitSentenceBolt();
20         WordCountBolt wordCountBolt = new WordCountBolt();
21         ReportBolt reportBolt = new ReportBolt();
22 
23         //构建一个拓扑Builder
24         TopologyBuilder topologyBuilder = new TopologyBuilder();
25 
26         //配置第一个组件sentenceSpout
27         topologyBuilder.setSpout(SENTENCE_SPOUT_ID, sentenceSpout, 2);
28 
29         //配置第二个组件splitSentenceBolt,上游为sentenceSpout,tuple分组方式为随机分组shuffleGrouping
30         topologyBuilder.setBolt(SPLIT_SENTENCE_BOLT_ID, splitSentenceBolt).shuffleGrouping(SENTENCE_SPOUT_ID);
31 
32         //配置第三个组件wordCountBolt,上游为splitSentenceBolt,tuple分组方式为fieldsGrouping,同一个单词将进入同一个task中(bolt实例)
33         topologyBuilder.setBolt(WORD_COUNT_BOLT_ID, wordCountBolt).fieldsGrouping(SPLIT_SENTENCE_BOLT_ID, new Fields("word"));
34 
35         //配置最后一个组件reportBolt,上游为wordCountBolt,tuple分组方式为globalGrouping,即所有的tuple都进入这一个task中
36         topologyBuilder.setBolt(REPORT_BOLT_ID, reportBolt).globalGrouping(WORD_COUNT_BOLT_ID);
37 
38         Config config = new Config();
39 
40         //建立本地集群,利用LocalCluster,storm在程序启动时会在本地自动建立一个集群,不需要用户自己再搭建,方便本地开发和debug
41         LocalCluster cluster = new LocalCluster();
42 
43         //创建拓扑实例,并提交到本地集群进行运行
44         cluster.submitTopology(TOPOLOGY_NAME, config, topologyBuilder.createTopology());
45     }
46 }
复制代码

  5.拓扑执行

  • 方法一:通过IDEA执行

  在idea中对代码进行编译compile,然后run;

  观察控制台输出会发现,storm首先在本地自动建立了运行环境,即启动了zookepeer,接着启动nimbus,supervisor;然后nimbus将提交的topology进行分发到supervisor,supervisor启动woker进程,woker进程里利用Executor来运行topology的组件(spout和bolt);最后在控制台发现不断的输出单词计数的结果。

     zookepeer的连接建立

   nimbus启动

   supervisor启动

   worker启动

     Executor启动执行

     结果输出

  • 方法二:通过maven来执行
    • 进入到该项目的主目录下:storm-learning
    • mvn compile 进行代码编译,保证代码编译通过
    • 通过mvn执行程序:
      mvn exec:java -Dexec.mainClass="wordCount.WordCountTopology"
    • 控制台输出的结果跟方法一一致


其他资料:

http://www.cnblogs.com/jonyo/p/5861125.html

http://www.cnblogs.com/jonyo/p/5861835.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值