Stream groupings中可以选择的组别及含义

转载 2015年07月10日 16:37:33

Stream groupings

Part of defining a topology is specifying for each bolt which streams it should receive as input. A stream grouping defines how that stream should be partitioned among the bolt's tasks.

There are seven built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the CustomStreamGrouping interface:

  1. Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
  2. Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
  3. Partial Key grouping: The stream is partitioned by the fields specified in the grouping, like the Fields grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. This paper provides a good explanation of how it works and the advantages it provides.
  4. All grouping: The stream is replicated across all the bolt's tasks. Use this grouping with care.
  5. Global grouping: The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
  6. None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).
  7. Direct grouping: This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the [emitDirect](/javadoc/apidocs/backtype/storm/task/OutputCollector.html#emitDirect(int, int, java.util.List) methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext or by keeping track of the output of the emit method in OutputCollector (which returns the task ids that the tuple was sent to).
  8. Local or shuffle grouping: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.

ASP的含义、功能和选择

  • 2012年08月19日 16:31
  • 2KB
  • 下载

电子设计竞赛个组别的试题集锦

  • 2011年05月04日 23:53
  • 349KB
  • 下载

chmod 777||改文件所有者和组别|| ftp服务器端口 vsftpd安装及使用

在Unix和Linux的各种操作系统下,每个文件(文件夹也被看作是文件)都按读、写、运行设定权限。 例如我用ls -l命令列文件表时,得到如下输出: -rw-r--r-- 1 bu users 2...

Hadoop好友推荐系统-组别数据入库

一、前端展现 1、jsp页面 table> tr> td>label for="name">分类数据路径:label> ...

变压器的连接组别.doc

  • 2009年06月08日 09:36
  • 23KB
  • 下载

锂电池参数含义及简单选择

1. 关于锂电池的标识 我们常常见到锂电池上有如下标识: 850mAh 25C 2S1P 各项什么意思呢? (1)指电池容量,即电池充满电按850mA放电能放1个小时 (2)25C指最大放电倍率,即...

【JQuery学习笔记一】理解JQuery对象含义和JQuery选择器的使用!

一.JQuery是什么? jQuery是一个快速、简洁的JavaScript框架,是继Prototype之后又一个优秀的JavaScript代码库(或JavaScript框架)。jQuery设计的宗旨...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Stream groupings中可以选择的组别及含义
举报原因:
原因补充:

(最多只允许输入30个字)