可靠的wordcount
1.实现storm的可靠性api
要实现可靠的api大致需要实现以下步骤:
- 实现spout的ack和fail方法
- 在spout发射的时候将发射的tuple与一个唯一的messageId进行绑定
- 在bolt发射新tuple的时候将当期tuple与发射的新tuple进行锚定
- bolt处理失败调用collector.fail,成功调用collector.ack
2.实现一个可靠的wordcount
2.1 自定义的单词发射器
一个可记录目前发射的所有单词个数,与之前的普通wordcount实现类似
public class SentenceEmitter {
private AtomicLong atomicLong = new AtomicLong(0);
private final AtomicLongMap<String> CONUTS = AtomicLongMap.create();
private final String[] SENTENCES = {"The logic for a realtime application is packaged into a Storm topology",
" A Storm topology is analogous to a MapReduce job ",
"One key difference is that a MapReduce job eventually finishes ",
"whereas a topology runs forever or until you kill it of course ",
"A topology is a graph of spouts and bolts that are connected with stream groupings"};
/**
* 随机发射sentence,并记录单词数量,该统计结果用于验证与storm的统计结果是否相同。
* 当发射总数<1000时,停止发射,以便程序在停止时,其它bolt能将发射的数据统计完毕
*
* @return
*/
public String emit() {
int randomIndex = (int) (Math.random() * SENTENCES.length);
String sentence = SENTENCES[randomIndex];
for (String s : sentence.split(" ")) {
CONUTS.incrementAndGet(s);
}
return sentence;
}
public void printCount() {
System.out.println("--- Emitter COUNTS ---");
List<String> keys = new ArrayList<String>();
keys.addAll(CONUTS.asMap().keySet());
Collections.sort(keys);
for (String key : keys) {
System.out.println(key + " : " + this.CONUTS.get(key));
}
System.out.println("--------------");
}
public AtomicLongMap<String> getCount() {
return CONUTS;
}
public static void main(String[] args) {
SentenceEmitter sentenceEmitter = new SentenceEmitter();
for (int i = 0; i < 20; i++) {
System.out.println(sentenceEmitter.emit());
}
sentenceEmitter.printCount();
}
}
可以随机的发射单词,并计数,打印方法,用于打印出计数。
2.2 可靠的spout实现SentenceSpout
这里重写了BaseRichSpout 的ack和fail方法。ConcurrentHashMap<UUID, Values> emitted用于缓存目前发送的所有tuple。nextTuple() 方法当发送的单词数量达到1000时停止向后继续发送便于统计,同时生成一个uuid与当前的tuple对应在发射的时候放入emitted,同时在发射的时候将uuid一并发射出去。ack方法调用时说明该tuple的所有下游tuple均处理成功,此时从缓存emitted移除该tuple。fail方法与ack方法对应,说明该tuple处理失败需要重发再处理,此时从缓存中取出失败的uuid对应tuple从新发送。当关闭topology的时候调用colse方法打印spout发送出的所有数据。
public class SentenceSpout extends BaseRichSpout {
private static final long serialVersionUID = -5335326175089829338L;
private static final Logger LOGGER = Logger.getLogger(WordSplitBolt.class);
private AtomicLong atomicLong = new AtomicLong(0);
private SpoutOutputCollector collector;
private SentenceEmitter sentenceEmitter;
private Concu