Storm案例程序(一)——单词统计

该博客介绍了如何使用Apache Storm构建一个简单的分布式实时计算系统,用于统计输入文本中的单词出现次数。首先创建SentenceSpout作为数据源,然后通过SplitSentenceBolt分割句子为单词,最后由WordCountBolt对单词进行计数。在本地运行可以看到每个单词的计数结果,而在集群模式下,可以通过命令提交拓扑并进行监控。
摘要由CSDN通过智能技术生成

storm案例程序——单词统计

书上的经典案例 《Storm分布式实时计算模式》

单词统计

通过Spout读取字符串,然后发送到第一个bolt对文本进行切割,然后在对切割好单词把相同的单词发送给第二个bolt来统计。

第一步:创建spout数据源

第二步:实现单词切割bolt

第三步:对单词进行统计bolt

第四步:创建Topology拓扑

在这里插入图片描述

maven

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>StrormDemo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>6</source>
                    <target>6</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <storm.version>1.1.0</storm.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>${storm.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-metrics</artifactId>
            <version>1.1.0</version>
        </dependency>

    </dependencies>

</project>

spout 数据源

import java.util.Map;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;

//spout数据源
@SuppressWarnings("serial")
public class SentenceSpout extends  BaseRichSpout {

    private SpoutOutputCollector collector;
    private String[] sentences = {
            "Apache Storm is a free and open source distributed realtime computation system",
            "Storm makes it easy to reliably process unbounded streams of data",
            "doing for realtime processing what Hadoop did for batch processing",
            "Storm is simple", "can be used with any programming language",
            "and is a lot of fun to use" };
    private int index = 0;

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //定义输出字段描述
        declarer.declare(new Fields("sentence"));
    }

    public void open(Map config, TopologyContext context,SpoutOutputCollector collector) {
        this.collector = collector;
    }


    public void nextTuple() {
        if(index >= sentences.length){
            return;
        }
        //发送字符串
        this.collector.emit(new Values(sentences[index]));
        index++;
        Utils.sleep(1);
    }
}

单词分割bolt

import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

//分割句子
public class SplitSentenceBolt extends BaseBasicBolt {

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //定义了传到下一个bolt的字段描述
        declarer.declare(new Fields("word"));
    }

    @Override
    public void execute(Tuple input, BasicOutputCollector collector) {
        String sentence = input.getStringByField("sentence");
        String[] words = sentence.split(" ");
        for (String word : words) {
            //发送单词
            collector.emit(new Values(word));
        }
    }
}

单词统计bolt

import java.util.HashMap;
import java.util.Map;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Tuple;


//单词统计
@SuppressWarnings("serial")
public class WordCountBolt extends BaseBasicBolt {

    private  Map<String, Long> counts = null;


    @SuppressWarnings("rawtypes")
    @Override
    public void prepare(Map stormConf, TopologyContext context) {
        this.counts = new HashMap<String, Long>();
    }

    @Override
    public void cleanup() {
        //拓扑结束执行
        for (String key : counts.keySet()) {
            System.out.println(key + " : " + this.counts.get(key));
        }
    }

    @Override
    public void execute(Tuple input, BasicOutputCollector collector) {
        String word = input.getStringByField("word");
        Long count = this.counts.get(word);
        if (count == null) {
            count = 0L;
        }
        count++;
        this.counts.put(word, count);
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {

    }

}

Topology 拓扑构建

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

//拓扑
public class WordCountTopology {

    public static void main(String[] args) throws Exception {

        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout", new SentenceSpout(), 1);
        builder.setBolt("split", new SplitSentenceBolt(), 2).shuffleGrouping("spout");
        builder.setBolt("count", new WordCountBolt(), 2).fieldsGrouping("split", new Fields("word"));

        Config conf = new Config();
        conf.setDebug(false);

        if (args != null && args.length > 0) {
            // 集群模式
            conf.setNumWorkers(2);
            StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
        } else {
            // 本地模式
            LocalCluster cluster = new LocalCluster();
            cluster.submitTopology("word-count", conf, builder.createTopology());
            Thread.sleep(10000);
            cluster.shutdown();
        }
    }
}

本地main直接运行

运行效果:

Storm : 3
doing : 1
be : 1
use : 1
for : 2
used : 1
source : 1
lot : 1
and : 2
of : 2
programming : 1
a : 2
realtime : 2
process : 1
Hadoop : 1
streams : 1
makes : 1
distributed : 1
it : 1
computation : 1
system : 1
processing : 2
to : 2
did : 1
fun : 1
data : 1
reliably : 1
batch : 1
language : 1
is : 3
simple : 1
Apache : 1
any : 1
easy : 1
can : 1
with : 1
what : 1
free : 1
unbounded : 1
open : 1
集群模式

maven打包项目成jar包

安装好storm后,并且启动好nimbus、supervisor、ui后,便可提交拓扑jar包

命令格式:storm jar 【jar路径】 【拓扑包名.拓扑类名】 【拓扑名称】

样例:storm jar stormTest.jar com.demo.WordCountTopology wordcountTop

停止拓扑

命令格式:storm kill 【拓扑名称】

将打好的 jar 把 storm.jar 传到 storm 根目录下面

bin/storm  jar  stormTest.jar   test.WordCountTopology  wordcountTop

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值