Trident spouts - Strom

Trident spouts

Like in the vanilla Storm API, spouts are the source of streams in a Trident topology. On top of the vanilla Storm spouts, Trident exposes additional APIs for more sophisticated spouts.
就像在vanilla Strom API, spouts是一个Trident topology数据流的源头。在vanilla Storm spouts之上,Trident有一些附加的APIs用于复杂的spouts。

There is an inextricable link between how you source your data streams and how you update state (e.g. databases) based on those data streams. See Trident state doc for an explanation of this – understanding this link is imperative for understanding the spout options available.
在你的数据源头和你如何基于这些数据流更新状态(例如,数据库)之前有很深的关联。请看 Trident state doc 的解释 – 理解这个关联对理解spout选项是重要的。

Regular Storm spouts will be non-transactional spouts in a Trident topology. To use a regular Storm IRichSpout, create the stream like this in a TridentTopology:
Regular Stormspouts是non-transactional的spouts在一个Trident topology。用一个Regular Storm IRichSpout,创建数据流在TridentTopology里:

TridentTopology topology = new TridentTopology();
topology.newStream("myspoutid", new MyRichSpout());

All spouts in a Trident topology are required to be given a unique identifier for the stream – this identifier must be unique across all topologies run on the cluster. Trident will use this identifier to store metadata about what the spout has consumed in Zookeeper, including the txid and any metadata associated with the spout.
所有Trident topology的spouts要求有一个唯一的标示数据流 – 这个标示必须是所有topologies云心集群上唯一的。Trident会用标示存储一些元数据信息关于这个spout消耗在Zookeeper上,包括txid和相关元数据信息。

You can configure the Zookeeper storage of spout metadata via the following configuration options:

  1. transactional.zookeeper.servers: A list of Zookeeper hostnames
  2. transactional.zookeeper.port: The port of the Zookeeper cluster
  3. transactional.zookeeper.root: The root dir in Zookeeper where metadata is stored. Metadata will be stored at the path /


By default, Trident processes a single batch at a time, waiting for the batch to succeed or fail before trying another batch. You can get significantly higher throughput – and lower latency of processing of each batch – by pipelining the batches. You configure the maximum amount of batches to be processed simultaneously with the “topology.max.spout.pending” property.
默认下,Trident一次处理一个独立的batch,等待这个batch成功或者失败在开始另一个batch前。你可以得到显著的吞吐量提高 – 低延时处理每个batch – 通过流水线处理batches。你配置最大的同时处理batches量通过”topology.max.spout.pending”属性。

Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches. For example, suppose you’re doing a global count aggregation into a database. The idea is that while you’re updating the count in the database for batch 1, you can still be computing the partial counts for batches 2 through 10. Trident won’t move on to the state updates for batch 2 until the state updates for batch 1 have succeeded. This is essential for achieving exactly-once processing semantics, as outline in Trident state doc.
当多batches同时处理时,Trident将排序任一个更新在topology的batches之间。例如,假设你做一个全局count聚合到数据库。当你更新batch 1的count值到数据库,你可能正在计算局部的batches 2的counts值到10.Trident不会给batch 2更新batch值直到batch 1更新成功。这样基本实现了准确-一次处理语义,具体请看 Trident state.

Trident spout types(Trident spout类型)

Here are the following spout APIs available:
有如下spout APIs是可用的:

  1. ITridentSpout: The most general API that can support transactional or opaque transactional semantics. Generally you’ll use one of the partitioned flavors of this API rather than this one directly.
  2. IBatchSpout: A non-transactional spout that emits batches of tuples at a time
  3. IPartitionedTridentSpout: A transactional spout that reads from a partitioned data source (like a cluster of Kafka servers)
  4. IOpaquePartitionedTridentSpout: An opaque transactional spout that reads from a partitioned data source

And, like mentioned in the beginning of this tutorial, you can use regular IRichSpout’s as well.
正如开始所说的,你可以用regular IRichSpout一样很好用。






当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


