StormFirehouse实现

最新推荐文章于 2019-06-03 18:21:35 发布

梦想成真那天

最新推荐文章于 2019-06-03 18:21:35 发布

阅读量230

点赞数

分类专栏： storm StormFirehose

本文链接：https://blog.csdn.net/u012164361/article/details/79885396

版权

storm 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

StormFirehose

1 篇文章 0 订阅

订阅专栏

要实现StormFirehouse 首先需要在Druid端注册一个StormFirehoseFactory,其实现就是实现FirehoseFactory接口,我们需要注意StormFirehosFactory注册对象是json类型的,必须要在类名上写上注解@JsonTypeName(“storm”)
从代码看StormFirehoseFactor主要就是就是生成并且返回Firehose对象.

@JsonTypeName("storm")
public class StormFirehoseFactory implements FirehoseFactory {
    private static final StormFirehose FIREHOSE = new StormFirehose();

    @JsonCreator
    public StormFirehoseFactory() {
    }

    public static StormFirehose getFirehose() {
        return FIREHOSE;
    }

    @Override
    public Firehose connect(InputRowParser inputRowParser) throws IOException, ParseException {
        return FIREHOSE;
    }
}

StormFirehose继承实现Firehose接口,主要实现了hasMore ,nextRow,Commit方法
在类中定义了全局静态的数据队列,其中BLOCK_QUEUE使用来存储DruidState中需要写入到Druid中的数据,也就是之前Storm需要持久化的数据持久化到StormFirehose的BlockQueue中,Druid掉用hasMore轮训对列是否为空,如果对为,方法会调用START.wait()一直阻塞在这里,如果BLOCK_QUEUE对垒的数据部位空,调用hasMore就会返回True,这是Druid会接着调用nextRow,如果对垒中的数据消费完成,那么当前批次提交的数据,已经处理完成,这个时候,需要把当前批次的ID保存到LIMBO_TRANSACTIOS对列中,同事在判断BLOCK_QUEUE对列中的数据为空的时候,可以释放sendMessage的阻塞状态,使得数据可以重新写入到对列中.然后Druid调用commit方法后,该方法需要返回一个Rannable对象,Durid会启动改线程,完成将LIMBO_TRANSACTIONS对列中的partitionId修改为COMPLETE状态.对于StormFirehose中自己写的sendMessage方法,其功能主要是把DruidState中的需要写入Druid的数据写入到BLOCK_QUEUE对列中,并且释放hasMore方法阻塞,并且阻塞改方法,一面外部数据再次提交.
“
public class StormFirehose implements Firehose {
private static final Logger LOG = LoggerFactory.getLogger(StormFirehose.class);
private static final Object START = new Object();
private static final Object FINISHED = new Object();
//数据队列
private static BlockingQueue BLOCKING_QUEUE;
public static final DruidPartitionStatus STATUS = new DruidPartitionStatus();
private static String TRANSACTION_ID = null;
private static BlockingQueue LIMBO_TRANSACTIONS = new ArrayBlockingQueue(99999);

@Override
public boolean hasMore() {
    if (BLOCKING_QUEUE != null && !BLOCKING_QUEUE.isEmpty()) {
        return true;
    }
    try {
        synchronized (START) {
            START.wait();
        }
    } catch (InterruptedException e) {
        LOG.error("hasMore() blocking was interrupted.", e);
    }
    return true;
}

@Override
public InputRow nextRow() {
    final Map<String, Object> theMap = Maps.newTreeMap(String.CASE_INSENSITIVE_ORDER);
    try {
        FixMessageDto message;
        message = BLOCKING_QUEUE.poll();
        if (message != null) {
            // LOG.info("[" + message.symbol + "] @ [" + message.price + "] for [" + message.uid + "]");
            theMap.put("symbol", message.symbol);
            theMap.put("price", message.price);
        }

        if (BLOCKING_QUEUE.isEmpty()) {
            STATUS.putInLimbo(TRANSACTION_ID);
            LIMBO_TRANSACTIONS.add(TRANSACTION_ID);
            LOG.info("Batch is fully consumed by Druid. Unlocking [FINISH]");
            synchronized (FINISHED) {
                FINISHED.notify();
            }
        }
    } catch (Exception e) {
        LOG.error("Error occurred in nextRow.", e);
    }
    final LinkedList<String> dimensions = new LinkedList<String>();
    dimensions.add("symbol");
    dimensions.add("price");
    return new MapBasedInputRow(System.currentTimeMillis(), dimensions, theMap);
}

@Override
public Runnable commit() {
    List<String> limboTransactions = new ArrayList<String>();
    LIMBO_TRANSACTIONS.drainTo(limboTransactions);
    return new StormCommitRunnable(limboTransactions);
}

public synchronized void sendMessages(String partitionId, List<FixMessageDto> messages) {
    BLOCKING_QUEUE = new ArrayBlockingQueue<FixMessageDto>(messages.size(), false, messages);
    TRANSACTION_ID = partitionId;
    LOG.info("Beginning commit to Druid. [" + messages.size() + "] messages, unlocking [START]");
    //对象锁
    synchronized (START) {
        START.notify();
    }
    try {
        synchronized (FINISHED) {
            FINISHED.wait();
        }
    } catch (InterruptedException e) {
        LOG.error("Commit to Druid interrupted.");
    }
    LOG.info("Returning control to Storm.");
}

@Override
public void close() throws IOException {
    // do nothing
}

}

StormCommitRunnable  是一个线程,在StormFirehose的commit方法调用时候,调用该线程主要是调用了DruidPartitionStatus类的complete方法,将limbo的批次的数据修改为已处理完成.

public class StormCommitRunnable implements Runnable {
private List partitionIds = null;

public StormCommitRunnable(List<String> partitionIds) {
    this.partitionIds = partitionIds;
}

@Override
public void run() {
    try {
        StormFirehose.STATUS.complete(partitionIds);
    } catch (Exception e) {
        Log.error("Could not complete transactions.", e);
    }
}

}

DruidPartitionStatus类中的各个方法主要是更改每个批次的数据在zk中标记的状态.

public class DruidPartitionStatus {
private static final Logger LOG = LoggerFactory.getLogger(DruidPartitionStatus.class);
final String COMPLETED_PATH = “completed”;
final String LIMBO_PATH = “limbo”;
final String CURRENT_PATH = “current”;
private CuratorFramework curatorFramework;

public DruidPartitionStatus() {
    try {
        curatorFramework = CuratorFrameworkFactory.builder().namespace("stormdruid")
                .connectString("localhost:2181").retryPolicy(new RetryNTimes(1, 1000)).connectionTimeoutMs(5000)
                .build();

        curatorFramework.start();
        if (curatorFramework.checkExists().forPath(COMPLETED_PATH) == null) {
            curatorFramework.create().forPath(COMPLETED_PATH);
        }

        if (curatorFramework.checkExists().forPath(CURRENT_PATH) == null) {
            curatorFramework.create().forPath(CURRENT_PATH);
        }

        if (curatorFramework.checkExists().forPath(LIMBO_PATH) == null) {
            curatorFramework.create().forPath(LIMBO_PATH);
        }
    } catch (Exception e) {
        LOG.error("Could not establish conneciton to Zookeeper", e);
    }
}

public boolean isCompleted(String partitionId) throws Exception {
    return (curatorFramework.checkExists().forPath(COMPLETED_PATH + "/" + partitionId) != null);
}

public boolean isInLimbo(String partitionId) throws Exception {
    return (curatorFramework.checkExists().forPath(LIMBO_PATH + "/" + partitionId) != null);
}

public boolean isInProgress(String partitionId) throws Exception {
    return (curatorFramework.checkExists().forPath(CURRENT_PATH + "/" + partitionId) != null);
}

public void putInProgress(String partitionId) throws Exception {
    curatorFramework.create().forPath(CURRENT_PATH + "/" + partitionId);
}

public void putInLimbo(String partitionId) throws Exception {
    curatorFramework.inTransaction().
            delete().forPath(CURRENT_PATH + "/" + partitionId)
            .and().create().forPath(LIMBO_PATH + "/" + partitionId).and().commit();
}

public void complete(List<String> partitionIds) throws Exception {
    Iterator<String> iterator = partitionIds.iterator();
    CuratorTransaction transaction = curatorFramework.inTransaction();
    while (iterator.hasNext()) {
        String partitionId = iterator.next();
        transaction = transaction.delete().forPath(LIMBO_PATH + "/" + partitionId)
                .and().create().forPath(COMPLETED_PATH + "/" + partitionId).and();
    }
    CuratorTransactionFinal tx = (CuratorTransactionFinal) transaction;
    tx.commit();
}

}
至此StormFirehose的实现结束,那么StormFirehose是如何完成数据摄取的呢,在之前的Trident Topology中我们是用了TridentTopology的persistent方法将数据通过DruidBeam的方式持久化到druid中,那么这里StormFirehose的与DruidBeam的功能类似,唯一不同的是druid的durid数据摄取方式的不同,在DruidBeam中是使用push的方式来摄取数据的,在StormFirehos中是通过pull的方式来实现数据摄取的,在Storm基本上是类似的,下面我么看看是入和实现的:
首先需要一个TridentTopology来调用persistent方法,第一个参数是StateFactory实例,用于创建DrudiStat实例,并且调用DruidState中的方法将数据持久化到其他的系统中.第二个参数是需要接受的字段名称,第三个字段是StateUpdate将数据,解析接收到的tuple数据,并且在调用StateFactory实例中的额方法,将数据存储待其他系统中.
具体代码如下:
FinancialAnalyticsTopology核心方法是

  inputStream.each(new Fields("message"), new MessageTypeFilter())
                .partitionPersist(new DruidStateFactory(), new Fields("message"), new DruidStateUpdater());
        return topology.build();

partitionPersist方法,将数据持久化到其他的系统中.

public class FinancialAnalyticsTopology {
private static final Logger LOG = LoggerFactory.getLogger(FinancialAnalyticsTopology.class);

public static StormTopology buildTopology() {
    LOG.info("Building topology.");
    TridentTopology topology = new TridentTopology();
    FixEventSpout spout = new FixEventSpout();
    Stream inputStream = topology.newStream("message", spout);

    inputStream.each(new Fields("message"), new MessageTypeFilter())
            .partitionPersist(new DruidStateFactory(), new Fields("message"), new DruidStateUpdater());
    return topology.build();
}

public static void main(String[] args) throws Exception {
    /*LogLevelAdjuster.register();*/

    final Config conf = new Config();
    final LocalCluster cluster = new LocalCluster();

    LOG.info("Submitting topology.");

    cluster.submitTopology("financial", conf, buildTopology());
    LOG.info("Topology submitted.");

    Thread.sleep(600000);
}

}

DruidStateFactory必须继承StateFactory并实现mekeState放创建实现State接口的DrudState方法

@SuppressWarnings(“rawtypes”)
public class DruidStateFactory implements StateFactory {
private static final long serialVersionUID = 1L;
private static final Logger LOG = LoggerFactory.getLogger(DruidStateFactory.class);

@Override
public State makeState(Map conf, IMetricsContext metrics, int partitionIndex, int numPartitions) {
    return new DruidState(partitionIndex);
}

}

DruidState在该方法中Strom主要是通过事物的方式将数据提交给其他的外部存储系统处理

public class DruidState implements State {
private static final Logger LOG = LoggerFactory.getLogger(DruidState.class);
private Vector messages = new Vector();
private int partitionIndex;

public DruidState(int partitionIndex){
    this.partitionIndex = partitionIndex;
}

@Override
public void beginCommit(Long batchId) {
}

@Override
public void commit(Long batchId) {
    String partitionId = batchId.toString() + "-" + partitionIndex;
    LOG.info("Committing partition [" + partitionIndex + "] of batch [" + batchId + "]");
    try {
        if (StormFirehose.STATUS.isCompleted(partitionId)) {
            LOG.warn("Encountered completed partition [" + partitionIndex + "] of batch [" + batchId + "]");
            return;
        } else if (StormFirehose.STATUS.isInLimbo(partitionId)) {
            LOG.warn("Encountered limbo partition [" + partitionIndex + "] of batch [" + batchId + "] : NOTIFY THE AUTHORITIES!");
            return;
        } else if (StormFirehose.STATUS.isInProgress(partitionId)) {
            LOG.warn("Encountered in-progress partition [\" + partitionIndex + \"] of batch [" + batchId + "] : NOTIFY THE AUTHORITIES!");
            return;
        }
        StormFirehose.STATUS.putInProgress(partitionId);
        StormFirehoseFactory.getFirehose().sendMessages(partitionId, messages);
    } catch (Exception e) {
        LOG.error("Could not start firehose for [" + partitionIndex + "] of batch [" + batchId + "]", e);
    }
}

public void aggregateMessage(FixMessageDto message) {
    // LOG.info("Aggregating [" + message + "]");
    messages.add(message);
}

}

StateUpdater 对于每次接受到的tuple数据,进行处理,并且调用DruidState的方法保存到其他的外部存储系统中

public class DruidStateUpdater implements StateUpdater {
private static final long serialVersionUID = 1L;
// private static final Logger LOG = LoggerFactory.getLogger(DruidStateUpdater.class);

@SuppressWarnings("rawtypes")
@Override
public void prepare(Map conf, TridentOperationContext context) {
}

@Override
public void cleanup() {
}

@Override
public void updateState(DruidState state, List<TridentTuple> tuples, TridentCollector collector) {
    //LOG.info("Updating [" + state + "]");
    for (TridentTuple tuple : tuples) {
        FixMessageDto message = (FixMessageDto) tuple.getValue(0);
        state.aggregateMessage(message);
    }
}

}
“`

梦想成真那天

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
StormFirehouse实现

要实现StormFirehouse 首先需要在Druid端注册一个StormFirehoseFactory,其实现就是实现FirehoseFactory接口,我们需要注意StormFirehosFactory注册对象是json类型的,必须要在类名上写上注解@JsonTypeName(“storm”) 从代码看StormFirehoseFactor主要就是就是生成并且返回Firehose对象....
复制链接

扫一扫

专栏目录