不一样的wordcount
workcount例子如同初学Java时的HelloWord一样,下面通过一个workcount了解storm的所有基础用法。
整个工程的结构图如下:
由5部分组成,其中topology,spout,bolt是wordcount的主要计算代码,而source是一个自定义的sentence发射器,util里面包含了日志解析的工具类。
1. 自定义可计数的sentence发射器
wordcount的主要作用是统计单词的个数,我们通过一SentenceEmitter记录发射出的单词总个数,然后对比storm统计的单词个数,对比二者的计数是否一致。
SentenceEmitter的代码如下:
/**
* @author JasonLin
* @version V1.0
*/
public class SentenceEmitter {
private AtomicLong atomicLong = new AtomicLong(0);
private final AtomicLongMap<String> CONUTS = AtomicLongMap.create();
private final String[] SENTENCES = {"The logic for a realtime application is packaged into a Storm topology",
" A Storm topology is analogous to a MapReduce job ",
"One key difference is that a MapReduce job eventually finishes ",
"whereas a topology runs forever or until you kill it of course ",
"A topology is a graph of spouts and bolts that are connected with stream groupings"};
/**
* 随机发射sentence,并记录单词数量,该统计结果用于验证与storm的统计结果是否相同。
* 当发射总数<1000时,停止发射,以便程序在停止时,其它bolt能将发射的数据统计完毕
*
* @return
*/
public String emit() {
if (atomicLong.incrementAndGet() >= 1000) {
try {
Thread.sleep(10000 * 1000);
} catch (InterruptedException e) {
return null;
}
}
int randomIndex = (int) (Math.random() * SENTENCES.length);
String sentence = SENTENCES[randomIndex];
for (String s : sentence.split(" ")) {
CONUTS.incrementAndGet(s);
}
return sentence;
}
public void printCount() {
System.out.println("--- Emitter COUNTS ---");
List<String> keys = new ArrayList<String>();
keys.addAll(CONUTS.asMap().keySet());
Collections.sort(keys);
for (String key : keys) {
System.out.println(key + " : " + this.CONUTS.get(key));
}
System.out.println("--------------");
}
public AtomicLongMap<String> getCount() {
return CONUTS;
}
public static void main(String[] args) {
SentenceEmitter sentenceEmitter = new SentenceEmitter();
for (int i = 0; i < 20; i++) {
System.out.println(sentenceEmitter.emit());
}
sentenceEmitter.printCount();
}
}
该类定义了一个emit()方法用于随机发射一个sentence,printCount() 用于打印emitter记录的发射的单词个数。
2. Spout组件
Spout组件是topology的数据来源,SentenceSpout通过SentenceEmitter的emit()方法来获取一个随机的sentence,然后发射出去。
/**
* @author JasonLin
* @version V1.0
*/
public class SentenceSpout extends BaseRichSpout {
private static final long serialVersionUID = -5335326175089829338L;
private SpoutOutputCollector collector;
private SentenceEmitter sentenceEmitter;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
this.sentenceEmitter = new SentenceEmitter();
}
@Override
pub