Storm【实践系列-如何写一个爬虫4】 - IndexBolt

最新推荐文章于 2024-10-02 22:56:59 发布

weixin_33975951

最新推荐文章于 2024-10-02 22:56:59 发布

阅读量105

点赞数

文章标签：大数据爬虫 python

原文链接：https://my.oschina.net/infiniteSpace/blog/305191

版权

2019独角兽企业重金招聘Python工程师标准>>>

package com.digitalpebble.storm.crawler.bolt.indexing;

import java.util.Map;

import org.slf4j.LoggerFactory;

import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Tuple;

import com.digitalpebble.storm.crawler.StormConfiguration;
import com.digitalpebble.storm.crawler.util.Configuration;

/**
 * A generic bolt for indexing documents which determines which endpoint to use
 * based on the configuration and delegates the indexing to it.
 ***/

@SuppressWarnings("serial")
public class IndexerBolt extends BaseRichBolt {

    private Configuration config;
    private BaseRichBolt endpoint;

    private static final org.slf4j.Logger LOG = LoggerFactory
            .getLogger(IndexerBolt.class);

    public void prepare(Map conf, TopologyContext context,
            OutputCollector collector) {
        config = StormConfiguration.create();

        // get the implementation to use
        // and instanciate it
        String className = config.get("stormcrawler.indexer.class");

        if (className == null) {
            throw new RuntimeException("No configuration found for indexing");
        }

        try {
            final Class<BaseRichBolt> implClass = (Class<BaseRichBolt>) Class
                    .forName(className);
            endpoint = implClass.newInstance();
        } catch (final Exception e) {
            throw new RuntimeException("Couldn't create " + className, e);
        }

        if (endpoint != null)
            endpoint.prepare(conf, context, collector);
    }

    public void execute(Tuple tuple) {
        if (endpoint != null)
            endpoint.execute(tuple);
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        if (endpoint != null)
            endpoint.declareOutputFields(declarer);
    }

}

simpleTips。

代码60行。大家都能看懂了。

转载于:https://my.oschina.net/infiniteSpace/blog/305191