获取Storm集群上TridentWordCount计算结果的方法

Storm 专栏收录该内容
4 篇文章 0 订阅

由于Storm集群上的拓扑是持续计算的,不像Hadoop会在HDFS上保存计算结果,因为对于提交到集群上的拓扑而言,需要用户在写程序的时候指定计算结果的输出位置,比如数据库或者本地文件,在运行TridentWordCount的时候,会设置一个DRPCStream来查询Trident的状态,从而可以查询某些单词的统计结果,由于storm-starter中的例子只给了在本地模式下使用localDRPC查询并输出的代码,当提交到集群之后只会持续计算没有结果输出,所以对该程序进行修改,使得提交到集群之后启动一个DRPCClient获取计算结果,程序修改如下:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package storm.starter.trident;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.LocalDRPC;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.DRPCClient;//DRPC需要的包
import storm.trident.TridentState;
import storm.trident.TridentTopology;
import storm.trident.operation.BaseFunction;
import storm.trident.operation.TridentCollector;
import storm.trident.operation.builtin.Count;
import storm.trident.operation.builtin.FilterNull;
import storm.trident.operation.builtin.MapGet;
import storm.trident.operation.builtin.Sum;
import storm.trident.testing.FixedBatchSpout;
import storm.trident.testing.MemoryMapState;
import storm.trident.tuple.TridentTuple;


public class TridentWordCount {
  public static class Split extends BaseFunction {
    @Override
    public void execute(TridentTuple tuple, TridentCollector collector) {
      String sentence = tuple.getString(0);
      for (String word : sentence.split(" ")) {
        collector.emit(new Values(word));
      }
    }
  }

  public static StormTopology buildTopology(DRPCClient drpc) {//此处参数由LocalDRPC修改为DRPCClient
    FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3, new Values("the cow jumped over the moon"),
        new Values("the man went to the store and bought some candy"), new Values("four score and seven years ago"),
        new Values("how many apples can you eat"), new Values("to be or not to be the person"));
    spout.setCycle(true);

    TridentTopology topology = new TridentTopology();
    TridentState wordCounts = topology.newStream("spout1", spout).parallelismHint(16).each(new Fields("sentence"),
        new Split(), new Fields("word")).groupBy(new Fields("word")).persistentAggregate(new MemoryMapState.Factory(),
        new Count(), new Fields("count")).parallelismHint(16);

    topology.newDRPCStream("words").each(new Fields("args"), new Split(), new Fields("word")).groupBy(new Fields(
        "word")).stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count")).each(new Fields("count"),
        new FilterNull()).aggregate(new Fields("count"), new Sum(), new Fields("sum"));
    return topology.build();
  }

  public static void main(String[] args) throws Exception {
    Config conf = new Config();
    conf.setMaxSpoutPending(20);
//    conf.setDebug(true);//输出Spout和Bolt的信息
/*    if (args.length == 0) {
      LocalDRPC drpc = new LocalDRPC();
      LocalCluster cluster = new LocalCluster();
      cluster.submitTopology("wordCounter", conf, buildTopology(drpc));
      for (int i = 0; i < 100; i++) {
        System.out.println("DRPC RESULT: " + drpc.execute("words", "jumped"));
        Thread.sleep(1000);
      }
    }
    else {*/
      conf.setNumWorkers(3);//由于本地DRPC和DRPC客户端的创建模式不同,因此无法兼容本地模式,故默认为集群模式
      DRPCClient client=new DRPCClient("drpc.server.location",3772);
      StormSubmitter.submitTopologyWithProgressBar(args[0], conf, buildTopology(client));
      
      while(true) {//持续输出对jumped和the的查询结果
    	    System.out.println(client.execute("words", "jumped the"));
          Thread.sleep(1000);
        }
   // }
  }
}


  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

<p> <span style="font-size:18px;">版本定位:</span> </p> <p> <span style="font-size:18px;">目前采用ELK7.x:即ELK(elasticsearch7.3+logstash7.3+kibana7.3)</span> </p> <span style="font-size:18px;">官网最新版本搭建集群和展示</span><br /> <p> <span style="font-size:18px;"></span> </p> <p> <span style="font-size:18px;"><strong>elk是什么意思中文?</strong></span> </p> <p> <span style="color:#333333;">ELK Stack 是Elasticsearch、Logstash、Kiban三个开源软件组合。在实时数据检索和分析场合,三者通常是配合共用,而且又都先后归于 Elastic.co 公司名下,故有此简称。 ELK Stack成为机器数据分析,或者说实时日志处理领域,开源界第一选择。和传统日志处理方案相比,ELK Stack 具有如下几个优点: • 处理方式灵活。Elasticsearch 是实时全文索引,不需要像 storm 那样预先编程才能使用; • 配置简易手。Elasticsearch 全部采用 JSON 接口,Logstash 是 Ruby DSL 设计,都是目前业界最通用配置语法设计; • 检索性能高效。虽然每次查询都是实时计算,但是优秀设计和实现基本可以达到全天数据查询秒级响应; • 集群线性扩展。不管是 Elasticsearch 集群还是 Logstash 集群都是可以线性扩展; • 前端操作炫丽。Kibana 界面,只需要点击鼠标,就可以完成搜索、聚合功能,生成炫丽仪表板。 官网地址:https://www.elastic.co/cn/</span> </p> <p> <strong><span style="font-size:18px;">elk日志分析系统?</span></strong> </p> <p> ELK+Beats日志分析系统部署,<span style="color:#333333;">Elasticsearch分布式集群部署,加ELK Stack堆栈,让搜索发挥极致</span> </p> <p> <br /> </p>
相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值