storm案例程序——单词统计
书上的经典案例 《Storm分布式实时计算模式》
单词统计
通过Spout读取字符串,然后发送到第一个bolt对文本进行切割,然后在对切割好单词把相同的单词发送给第二个bolt来统计。
第一步:创建spout数据源
第二步:实现单词切割bolt
第三步:对单词进行统计bolt
第四步:创建Topology拓扑
maven
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>StrormDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>6</source>
<target>6</target>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<storm.version>1.1.0</storm.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${storm.version}</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-metrics</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
</project>
spout 数据源
import java.util.Map;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
//spout数据源
@SuppressWarnings("serial")
public class SentenceSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private String[] sentences = {
"Apache Storm is a free and open source distributed realtime computation system",
"Storm makes it easy to reliably process unbounded streams of data",
"doing for realtime processing what Hadoop did for batch processing",
"Storm is simple", "can be used with any programming language",
"and is a lot of fun to use" };
private int index = 0;
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//定义输出字段描述
declarer.declare(new Fields("sentence"));
}
public void open(Map config, TopologyContext context,SpoutOutputCollector collector) {
this.collector = collector;
}
public void nextTuple() {
if(index >= sentences.length){
return;
}
//发送字符串
this.collector.emit(new Values(sentences[index]));
index++;
Utils.sleep(1);
}
}
单词分割bolt
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
//分割句子
public class SplitSentenceBolt extends BaseBasicBolt {
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//定义了传到下一个bolt的字段描述
declarer.declare(new Fields("word"));
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String sentence = input.getStringByField("sentence");
String[] words = sentence.split(" ");
for (String word : words) {
//发送单词
collector.emit(new Values(word));
}
}
}
单词统计bolt
import java.util.HashMap;
import java.util.Map;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Tuple;
//单词统计
@SuppressWarnings("serial")
public class WordCountBolt extends BaseBasicBolt {
private Map<String, Long> counts = null;
@SuppressWarnings("rawtypes")
@Override
public void prepare(Map stormConf, TopologyContext context) {
this.counts = new HashMap<String, Long>();
}
@Override
public void cleanup() {
//拓扑结束执行
for (String key : counts.keySet()) {
System.out.println(key + " : " + this.counts.get(key));
}
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String word = input.getStringByField("word");
Long count = this.counts.get(word);
if (count == null) {
count = 0L;
}
count++;
this.counts.put(word, count);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
Topology 拓扑构建
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
//拓扑
public class WordCountTopology {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new SentenceSpout(), 1);
builder.setBolt("split", new SplitSentenceBolt(), 2).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 2).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(false);
if (args != null && args.length > 0) {
// 集群模式
conf.setNumWorkers(2);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
// 本地模式
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
}
本地main直接运行
运行效果:
Storm : 3
doing : 1
be : 1
use : 1
for : 2
used : 1
source : 1
lot : 1
and : 2
of : 2
programming : 1
a : 2
realtime : 2
process : 1
Hadoop : 1
streams : 1
makes : 1
distributed : 1
it : 1
computation : 1
system : 1
processing : 2
to : 2
did : 1
fun : 1
data : 1
reliably : 1
batch : 1
language : 1
is : 3
simple : 1
Apache : 1
any : 1
easy : 1
can : 1
with : 1
what : 1
free : 1
unbounded : 1
open : 1
集群模式
maven打包项目成jar包
安装好storm后,并且启动好nimbus、supervisor、ui后,便可提交拓扑jar包
命令格式:storm jar 【jar路径】 【拓扑包名.拓扑类名】 【拓扑名称】
样例:storm jar stormTest.jar com.demo.WordCountTopology wordcountTop
停止拓扑
命令格式:storm kill 【拓扑名称】
将打好的 jar 把 storm.jar 传到 storm 根目录下面
bin/storm jar stormTest.jar test.WordCountTopology wordcountTop