Trident spouts - Strom

转载 2013年12月03日 21:23:37

Trident spouts

Like in the vanilla Storm API, spouts are the source of streams in a Trident topology. On top of the vanilla Storm spouts, Trident exposes additional APIs for more sophisticated spouts.
就像在vanilla Strom API, spouts是一个Trident topology数据流的源头。在vanilla Storm spouts之上,Trident有一些附加的APIs用于复杂的spouts。

There is an inextricable link between how you source your data streams and how you update state (e.g. databases) based on those data streams. See Trident state doc for an explanation of this – understanding this link is imperative for understanding the spout options available.
在你的数据源头和你如何基于这些数据流更新状态(例如,数据库)之前有很深的关联。请看 Trident state doc 的解释 – 理解这个关联对理解spout选项是重要的。

Regular Storm spouts will be non-transactional spouts in a Trident topology. To use a regular Storm IRichSpout, create the stream like this in a TridentTopology:
Regular Stormspouts是non-transactional的spouts在一个Trident topology。用一个Regular Storm IRichSpout,创建数据流在TridentTopology里:

TridentTopology topology = new TridentTopology();
topology.newStream("myspoutid", new MyRichSpout());

All spouts in a Trident topology are required to be given a unique identifier for the stream – this identifier must be unique across all topologies run on the cluster. Trident will use this identifier to store metadata about what the spout has consumed in Zookeeper, including the txid and any metadata associated with the spout.
所有Trident topology的spouts要求有一个唯一的标示数据流 – 这个标示必须是所有topologies云心集群上唯一的。Trident会用标示存储一些元数据信息关于这个spout消耗在Zookeeper上,包括txid和相关元数据信息。

You can configure the Zookeeper storage of spout metadata via the following configuration options:

  1. transactional.zookeeper.servers: A list of Zookeeper hostnames
  2. transactional.zookeeper.port: The port of the Zookeeper cluster
  3. transactional.zookeeper.root: The root dir in Zookeeper where metadata is stored. Metadata will be stored at the path /


By default, Trident processes a single batch at a time, waiting for the batch to succeed or fail before trying another batch. You can get significantly higher throughput – and lower latency of processing of each batch – by pipelining the batches. You configure the maximum amount of batches to be processed simultaneously with the “topology.max.spout.pending” property.
默认下,Trident一次处理一个独立的batch,等待这个batch成功或者失败在开始另一个batch前。你可以得到显著的吞吐量提高 – 低延时处理每个batch – 通过流水线处理batches。你配置最大的同时处理batches量通过”topology.max.spout.pending”属性。

Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches. For example, suppose you’re doing a global count aggregation into a database. The idea is that while you’re updating the count in the database for batch 1, you can still be computing the partial counts for batches 2 through 10. Trident won’t move on to the state updates for batch 2 until the state updates for batch 1 have succeeded. This is essential for achieving exactly-once processing semantics, as outline in Trident state doc.
当多batches同时处理时,Trident将排序任一个更新在topology的batches之间。例如,假设你做一个全局count聚合到数据库。当你更新batch 1的count值到数据库,你可能正在计算局部的batches 2的counts值到10.Trident不会给batch 2更新batch值直到batch 1更新成功。这样基本实现了准确-一次处理语义,具体请看 Trident state.

Trident spout types(Trident spout类型)

Here are the following spout APIs available:
有如下spout APIs是可用的:

  1. ITridentSpout: The most general API that can support transactional or opaque transactional semantics. Generally you’ll use one of the partitioned flavors of this API rather than this one directly.
  2. IBatchSpout: A non-transactional spout that emits batches of tuples at a time
  3. IPartitionedTridentSpout: A transactional spout that reads from a partitioned data source (like a cluster of Kafka servers)
  4. IOpaquePartitionedTridentSpout: An opaque transactional spout that reads from a partitioned data source

And, like mentioned in the beginning of this tutorial, you can use regular IRichSpout’s as well.
正如开始所说的,你可以用regular IRichSpout一样很好用。

Storm可靠性及事务性相关设计: Acker及Trident State

上面这件事一般IBasicBolt可以罩住,更多的方法可以使用IRichBolt。 一个topology里面的acker数量是可以设置的,然后tuple比较多的话可以多设置几个acker,提高效率。每...
  • zbf8441372
  • zbf8441372
  • 2013年12月30日 20:56
  • 4094


  • conansix
  • conansix
  • 2017年06月09日 22:58
  • 1408


1 Storm0.9.3中的对HBase的集成 Storm新版本0.9.3中重新整理和加入了对HBase的集成模块,除了基本的Bolt和Spout之外,加入了用于访问HBase的Trident。...
  • liuxiao723846
  • liuxiao723846
  • 2016年08月29日 12:07
  • 3530

[译]【Storm入门指南】第四章 Spouts

特别注明:本文翻译自 Getting started with Storm 第四章,以作学习交流之用,非盈利性质。如需转载,请以超链接形式标明文章原始出处和作者信息及版权声明。 本章,你将学...
  • ABC374744988
  • ABC374744988
  • 2015年03月18日 22:20
  • 449


本文翻译自《Getting Started With Storm》  译者:吴京润   编辑:方腾飞 你将在本章了解到spout作为拓扑入口和它的容错机制相关的最常见的设计策略。 可靠...
  • lulongzhou_llz
  • lulongzhou_llz
  • 2015年06月07日 11:47
  • 514


  • 2016年08月18日 14:42
  • 90KB
  • 下载

Big Data : strom工程师NathanMarz

  • 2015年08月17日 13:55
  • 9.61MB
  • 下载


  • 2015年08月27日 16:24
  • 42.66MB
  • 下载

strom 部署文档资料

  • 2016年04月05日 02:31
  • 2.08MB
  • 下载

Strom程序的并发机制,配置并行度(代码实现)、动态改变并行度,local or shuffle分组,分组的概念以及分组类型

1、Storm程序的并发机制1.1、概念 Workers (JVMs): 在一个物理节点上可以运行一个或多个独立的JVM 进程。一个Topology可以包含一个或多个worker(并行的跑在不同的物...
  • toto1297488504
  • toto1297488504
  • 2017年06月20日 23:22
  • 295
您举报文章:Trident spouts - Strom