Trident spouts - Strom

转载 2013年12月03日 21:23:37

Trident spouts

Like in the vanilla Storm API, spouts are the source of streams in a Trident topology. On top of the vanilla Storm spouts, Trident exposes additional APIs for more sophisticated spouts.
就像在vanilla Strom API, spouts是一个Trident topology数据流的源头。在vanilla Storm spouts之上,Trident有一些附加的APIs用于复杂的spouts。

There is an inextricable link between how you source your data streams and how you update state (e.g. databases) based on those data streams. See Trident state doc for an explanation of this – understanding this link is imperative for understanding the spout options available.
在你的数据源头和你如何基于这些数据流更新状态(例如,数据库)之前有很深的关联。请看 Trident state doc 的解释 – 理解这个关联对理解spout选项是重要的。

Regular Storm spouts will be non-transactional spouts in a Trident topology. To use a regular Storm IRichSpout, create the stream like this in a TridentTopology:
Regular Stormspouts是non-transactional的spouts在一个Trident topology。用一个Regular Storm IRichSpout,创建数据流在TridentTopology里:

TridentTopology topology = new TridentTopology();
topology.newStream("myspoutid", new MyRichSpout());

All spouts in a Trident topology are required to be given a unique identifier for the stream – this identifier must be unique across all topologies run on the cluster. Trident will use this identifier to store metadata about what the spout has consumed in Zookeeper, including the txid and any metadata associated with the spout.
所有Trident topology的spouts要求有一个唯一的标示数据流 – 这个标示必须是所有topologies云心集群上唯一的。Trident会用标示存储一些元数据信息关于这个spout消耗在Zookeeper上,包括txid和相关元数据信息。

You can configure the Zookeeper storage of spout metadata via the following configuration options:
你可以配置Zookeeper存储spout信息通过下面的配置:

  1. transactional.zookeeper.servers: A list of Zookeeper hostnames
  2. transactional.zookeeper.port: The port of the Zookeeper cluster
  3. transactional.zookeeper.root: The root dir in Zookeeper where metadata is stored. Metadata will be stored at the path /

Pipelining

By default, Trident processes a single batch at a time, waiting for the batch to succeed or fail before trying another batch. You can get significantly higher throughput – and lower latency of processing of each batch – by pipelining the batches. You configure the maximum amount of batches to be processed simultaneously with the “topology.max.spout.pending” property.
默认下,Trident一次处理一个独立的batch,等待这个batch成功或者失败在开始另一个batch前。你可以得到显著的吞吐量提高 – 低延时处理每个batch – 通过流水线处理batches。你配置最大的同时处理batches量通过”topology.max.spout.pending”属性。

Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches. For example, suppose you’re doing a global count aggregation into a database. The idea is that while you’re updating the count in the database for batch 1, you can still be computing the partial counts for batches 2 through 10. Trident won’t move on to the state updates for batch 2 until the state updates for batch 1 have succeeded. This is essential for achieving exactly-once processing semantics, as outline in Trident state doc.
当多batches同时处理时,Trident将排序任一个更新在topology的batches之间。例如,假设你做一个全局count聚合到数据库。当你更新batch 1的count值到数据库,你可能正在计算局部的batches 2的counts值到10.Trident不会给batch 2更新batch值直到batch 1更新成功。这样基本实现了准确-一次处理语义,具体请看 Trident state.

Trident spout types(Trident spout类型)

Here are the following spout APIs available:
有如下spout APIs是可用的:

  1. ITridentSpout: The most general API that can support transactional or opaque transactional semantics. Generally you’ll use one of the partitioned flavors of this API rather than this one directly.
    ITridentSpout:最通用的API可以支持事务或者不透明事务语义。一般你会用这个API分区的实现,而不是直接使用。
  2. IBatchSpout: A non-transactional spout that emits batches of tuples at a time
    IBatchSpout:一个非事务spout,发射batches一次。
  3. IPartitionedTridentSpout: A transactional spout that reads from a partitioned data source (like a cluster of Kafka servers)
    IPartitionedTridentSpout:一个事务spout,读分区数据从数据源(比如一个Kafka集群)
  4. IOpaquePartitionedTridentSpout: An opaque transactional spout that reads from a partitioned data source
    IOpaquePartitionedTridentSpout:一个不透明事务spout,读从分区数据源

And, like mentioned in the beginning of this tutorial, you can use regular IRichSpout’s as well.
正如开始所说的,你可以用regular IRichSpout一样很好用。



相关文章推荐

[译]【Storm入门指南】第四章 Spouts

特别注明:本文翻译自 Getting started with Storm 第四章,以作学习交流之用,非盈利性质。如需转载,请以超链接形式标明文章原始出处和作者信息及版权声明。 本章,你将学...

Storm入门之第四章Spouts

本文翻译自《Getting Started With Storm》  译者:吴京润   编辑:方腾飞 你将在本章了解到spout作为拓扑入口和它的容错机制相关的最常见的设计策略。 可靠...

Storm_Trident

  • 2016年08月18日 14:42
  • 90KB
  • 下载

Big Data : strom工程师NathanMarz

  • 2015年08月17日 13:55
  • 9.61MB
  • 下载

Samza与Strom

Samza官方与Strom的对比文档的翻译版本
  • zs808
  • zs808
  • 2016年07月13日 10:59
  • 4620

strom源码分析

  • 2015年08月27日 16:24
  • 42.66MB
  • 下载

strom 部署文档资料

  • 2016年04月05日 02:31
  • 2.08MB
  • 下载

Strom学习01--例子WordCountTopology

WordCountTopologySpoutspout为数据的源头,通过TopologyBuilder创建一个Spout ,用于模拟数据的源头, builder.setSpout("spout"...

strom的jar包

  • 2017年09月25日 09:24
  • 38.34MB
  • 下载

STROM去广告最新补丁V1.9 2012

  • 2010年03月08日 15:26
  • 160KB
  • 下载
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Trident spouts - Strom
举报原因:
原因补充:

(最多只允许输入30个字)