从wordcount了解storm的所有基础用法

不一样的wordcount

workcount例子如同初学Java时的HelloWord一样,下面通过一个workcount了解storm的所有基础用法。

整个工程的结构图如下:
工程结构图
由5部分组成,其中topology,spout,bolt是wordcount的主要计算代码,而source是一个自定义的sentence发射器,util里面包含了日志解析的工具类。

1. 自定义可计数的sentence发射器

wordcount的主要作用是统计单词的个数,我们通过一SentenceEmitter记录发射出的单词总个数,然后对比storm统计的单词个数,对比二者的计数是否一致。
SentenceEmitter的代码如下:


/**
 * @author JasonLin
 * @version V1.0
 */
public class SentenceEmitter {
    private AtomicLong atomicLong = new AtomicLong(0);

    private final AtomicLongMap<String> CONUTS = AtomicLongMap.create();

    private final String[] SENTENCES = {"The logic for a realtime application is packaged into a Storm topology",
            " A Storm topology is analogous to a MapReduce job ",
            "One key difference is that a MapReduce job eventually finishes ",
            "whereas a topology runs forever or until you kill it of course ",
            "A topology is a graph of spouts and bolts that are connected with stream groupings"};


    /**
     * 随机发射sentence,并记录单词数量,该统计结果用于验证与storm的统计结果是否相同。
     * 当发射总数<1000时,停止发射,以便程序在停止时,其它bolt能将发射的数据统计完毕
     *
     * @return
     */
    public String emit() {
        if (atomicLong.incrementAndGet() >= 1000) {
            try {
                Thread.sleep(10000 * 1000);
            } catch (InterruptedException e) {
                return null;
            }
        }
        int randomIndex = (int) (Math.random() * SENTENCES.length);
        String sentence = SENTENCES[randomIndex];
        for (String s : sentence.split(" ")) {
            CONUTS.incrementAndGet(s);
        }
        return sentence;
    }

    public void printCount() {
        System.out.println("--- Emitter COUNTS ---");
        List<String> keys = new ArrayList<String>();
        keys.addAll(CONUTS.asMap().keySet());
        Collections.sort(keys);
        for (String key : keys) {
            System.out.println(key + " : " + this.CONUTS.get(key));
        }
        System.out.println("--------------");
    }

    public AtomicLongMap<String> getCount() {
        return CONUTS;
    }

    public static void main(String[] args) {
        SentenceEmitter sentenceEmitter = new SentenceEmitter();
        for (int i = 0; i < 20; i++) {
            System.out.println(sentenceEmitter.emit());
        }
        sentenceEmitter.printCount();
    }
}

该类定义了一个emit()方法用于随机发射一个sentence,printCount() 用于打印emitter记录的发射的单词个数。

2. Spout组件

Spout组件是topology的数据来源,SentenceSpout通过SentenceEmitter的emit()方法来获取一个随机的sentence,然后发射出去。

/**
 * @author JasonLin
 * @version V1.0
 */
public class SentenceSpout extends BaseRichSpout {

    private static final long serialVersionUID = -5335326175089829338L;
    private SpoutOutputCollector collector;
    private SentenceEmitter sentenceEmitter;

    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        this.collector = collector;
        this.sentenceEmitter = new SentenceEmitter();
    }

    @Override
    pub
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值