兼容storm(beta版)

翻译 2015年11月20日 16:03:27

Flink streaming 兼容storm的api 接口,因此可以复用storm写的项目 。

你可以:

  • 在flink上执行一个完整的storm topology.
  • 使用storm的spout和bolt,替换flink的source和operator。

本文档展示如何在flink中,复用storm的代码.

项目配置

引入flink-storm这个依赖,用来在flink中运行storm的代码

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-storm</artifactId>
	<version>1.0-SNAPSHOT</version>
</dependency>

Please noteflink-storm之外还需要引入flink的其他包. See WordCount Storm within flink-storm-examples/pom.xml for an example how to package a jar correctly.

执行Storm Topology

Flink 兼容storm的api (org.apache.flink.storm.api) ,提供了下面的几个替换类:

  • TopologyBuilder 替换成 FlinkTopologyBuilder
  • StormSubmitter 替换成 FlinkSubmitter
  • NimbusClient 和Client 替换成 FlinkClient
  • LocalCluster替换成 FlinkLocalCluster

为了提交一个 Storm topology 到Flink里, 需要使用上面几个类来替代storm里的代码.而实际运行的代码,  Spouts 和 Bolts, 可以不修改. 

如果topology想运行在远程的集群上, 下面的参数需要配置。

parameters nimbus.host and nimbus.thrift.port are used as jobmanger.rpc.address and jobmanger.rpc.port, respectively. If a parameter is not specified, the value is taken from flink-conf.yaml.

FlinkTopologyBuilder builder = new FlinkTopologyBuilder(); // replaces: TopologyBuilder builder = new FlinkTopology();

// actual topology assembling code and used Spouts/Bolts can be used as-is
builder.setSpout("source", new FileSpout(inputFilePath));
builder.setBolt("tokenizer", new BoltTokenizer()).shuffleGrouping("source");
builder.setBolt("counter", new BoltCounter()).fieldsGrouping("tokenizer", new Fields("word"));
builder.setBolt("sink", new BoltFileSink(outputFilePath)).shuffleGrouping("counter");

Config conf = new Config();
if(runLocal) { // submit to test cluster
	FlinkLocalCluster cluster = new FlinkLocalCluster(); // replaces: LocalCluster cluster = new LocalCluster();
	cluster.submitTopology("WordCount", conf, builder.createTopology());
} else { // submit to remote cluster
	// optional
	// conf.put(Config.NIMBUS_HOST, "remoteHost");
	// conf.put(Config.NIMBUS_THRIFT_PORT, 6123);
	FlinkSubmitter.submitTopology("WordCount", conf, builder.createTopology()); // replaces: StormSubmitter.submitTopology(topologyId, conf, builder.createTopology());
}

Embed Storm Operators in Flink Streaming Programs

另一种方案, Spouts 和Bolts 可以插入flink 的streaming程序中. Storm兼容层分别对其提供了包装类,也就是SpoutWrapper 和 BoltWrapper这两个类 (org.apache.flink.storm.wrappers).将storm输出的tuple包装成, Flink’s Tuple 类型(ie, Tuple0 to Tuple25 根据输出storm的field的数量). 对于只有一个的filed,进行数据类型转换 (eg, String instead of Tuple1<String>).

因为 Flink 不能推断出storm操作输出数据的类型,一般需要定义输出数据的类型. 才能转换对的类型, 可以使用Flink的 TypeExtractor .

包装Spouts

为了包装spout成 Flink的 source, 使用StreamExecutionEnvironment.addSource(SourceFunction, TypeInformation).  Spout 对象作为SpoutWrapper<OUT> 的构造方法的参数,作为 addSource(...)的第一个参数. 该泛型定义了spout输出field的数据类型。

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// stream has `raw` type (single field output streams only)
DataStream<String> rawInput = env.addSource(
	new SpoutWrapper<String>(new FileSpout(localFilePath), new String[] { Utils.DEFAULT_STREAM_ID }), // emit default output stream as raw type
	TypeExtractor.getForClass(String.class)); // output type

// process data stream
[...]

如果Spout是个有限流 , SpoutWrapper 可以配置numberOfInvocations 这个参数来使spout自动停止。它让Flink 程序自动关闭当数据处理结束.每个程序会运行到它自动结束。

包装Bolts

为了使用Bolt 作为Flink 的操作, 使用DataStream.transform(String, TypeInformation, OneInputStreamOperator)方法。. Bolt作为BoltWrapper<IN,OUT> 的构造方法的参数列表的最后一个参transform(...). 改泛型分别定义了该操作的输入和输出类型

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile(localFilePath);

DataStream<Tuple2<String, Integer>> counts = text.transform(
	"tokenizer", // operator name
	TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)), // output type
	new BoltWrapper<String, Tuple2<String, Integer>>(new BoltTokenizer())); // Bolt operator

// do further processing
[...]

Named Attribute Access for Embedded Bolts

Bolts can accesses input tuple fields via name (additionally to access via index). To use this feature with embedded Bolts, you need to have either a

  1. POJO type input stream or
  2. Tuple type input stream and spedify the input schema (ie, name-to-index-mapping)

For POJO input types, Flink accesses the fields via reflection. For this case, Flink expects either a corresponding public member variable or public getter method. For example, if a Bolt accesses a field via name sentence (eg, String s = input.getStringByField("sentence");), the input POJO class must have a member variable public String sentence; or method public String getSentence() { ... }; (pay attention to camel-case naming).

For Tuple input types, it is required to specify the input schema using Storm’s Fields class. For this case, the constructor of BoltWrapper takes an additional argument: new BoltWrapper<Tuple1<String>, ...>(..., new Fields("sentence")). The input type is Tuple1<String> andFields("sentence") specify that input.getStringByField("sentence") is equivalent to input.getString(0).

See BoltTokenizerWordCountPojo and BoltTokenizerWordCountWithNames for examples.

Configuring Spouts and Bolts

In Storm, Spouts and Bolts can be configured with a globally distributed Map object that is given to submitTopology(...) method of LocalClusteror StormSubmitter. This Map is provided by the user next to the topology and gets forwarded as a parameter to the calls Spout.open(...) andBolt.prepare(...). If a whole topology is executed in Flink using FlinkTopologyBuilder etc., there is no special attention required – it works as in regular Storm.

For embedded usage, Flink’s configuration mechanism must be used. A global configuration can be set in a StreamExecutionEnvironment via.getConfig().setGlobalJobParameters(...). Flink’s regular Configuration class can be used to configure Spouts and Bolts. However,Configuration does not support arbitrary key data types as Storm does (only String keys are allowed). Thus, Flink additionally providesStormConfig class that can be used like a raw Map to provide full compatibility to Storm.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

StormConfig config = new StormConfig();
// set config values
[...]

// set global Storm configuration
env.getConfig().setGlobalJobParameters(config);

// assemble program with embedded Spouts and/or Bolts
[...]

Multiple Output Streams

Flink can also handle the declaration of multiple output streams for Spouts and Bolts. If a whole topology is executed in Flink usingFlinkTopologyBuilder etc., there is no special attention required – it works as in regular Storm.

For embedded usage, the output stream will be of data type SplitStreamType<T> and must be split by using DataStream.split(...) andSplitStream.select(...). Flink provides the predefined output selector StormStreamSelector<T> for .split(...) already. Furthermore, the wrapper type SplitStreamTuple<T> can be removed using SplitStreamMapper<T>.

[...]

// get DataStream from Spout or Bolt which declares two output streams s1 and s2 with output type SomeType
DataStream<SplitStreamType<SomeType>> multiStream = ...

SplitStream<SplitStreamType<SomeType>> splitStream = multiStream.split(new StormStreamSelector<SomeType>());

// remove SplitStreamType using SplitStreamMapper to get data stream of type SomeType
DataStream<SomeType> s1 = splitStream.select("s1").map(new SplitStreamMapper<SomeType>()).returns(SomeType.classs);
DataStream<SomeType> s2 = splitStream.select("s2").map(new SplitStreamMapper<SomeType>()).returns(SomeType.classs);

// do further processing on s1 and s2
[...]

See SpoutSplitExample.java for a full example.

Flink Extensions

Finite Spouts

In Flink, streaming sources can be finite, ie, emit a finite number of records and stop after emitting the last record. However, Spouts usually emit infinite streams. The bridge between the two approaches is the FiniteSpout interface which, in addition to IRichSpout, contains a reachedEnd()method, where the user can specify a stopping-condition. The user can create a finite Spout by implementing this interface instead of (or additionally to) IRichSpout, and implementing the reachedEnd() method in addition. In contrast to a SpoutWrapper that is configured to emit a finite number of tuples, FiniteSpout interface allows to implement more complex termination criteria.

Although finite Spouts are not necessary to embed Spouts into a Flink streaming program or to submit a whole Storm topology to Flink, there are cases where they may come in handy:

  • to achieve that a native Spout behaves the same way as a finite Flink source with minimal modifications
  • the user wants to process a stream only for some time; after that, the Spout can stop automatically
  • reading a file into a stream
  • for testing purposes

An example of a finite Spout that emits records for 10 seconds only:

public class TimedFiniteSpout extends BaseRichSpout implements FiniteSpout {
	[...] // implemente open(), nextTuple(), ...

	private long starttime = System.currentTimeMillis();

	public boolean reachedEnd() {
		return System.currentTimeMillis() - starttime > 10000l;
	}
}

兼容Storm 的Example

 flink-storm-examples里可以找到更多例子. 对不同版本的 WordCount, see README.md. 你需要assembly打包正确才能运行该程序. flink-storm-examples-1.0-SNAPSHOT.jar 

除了有单独的包装spout和bolt的例子外. 此外, 还有完整的topology的包装例子

你可以通过运行 bin/flink run <jarname>.jar来执行例子。

相关文章推荐

告别图片—使用字符实现兼容性的圆角尖角效果beta版

by zhangxinxu from http://www.zhangxinxu.com 本文地址:http://www.zhangxinxu.com/wordpress/?p=332 一、前...

一个数字转中文大写货币数字的类.完美兼容于C#所有值类型转换和操作,beta版.希望大家多多纠正.

巨雷公司,版权所有,转帖请注明出处,并保证文章完整性.未经作者书面允许请勿应用于商业.否则将遇到不必要的法律纠纷问题.  public struct CHMoney : IComparable,...
  • sjzlxd
  • sjzlxd
  • 2011年05月31日 09:06
  • 1730

php storm keygen 很多版本兼容

  • 2013年05月04日 17:38
  • 34KB
  • 下载

android兼容各版本共享view beta1

使用 public class MainActivity extends AppCompatActivity { @Override protected void onCreate(...

从零开始学Storm 第2版

  • 2017年11月21日 09:24
  • 170.02MB
  • 下载

Storm JAVA版上手demo下载地址

  • 2017年11月23日 16:54
  • 43B
  • 下载

【JavaScript 封装库】BETA 1.0 测试版发布!

/* 源码作者: 石不易(Louis Shi) 联系方式: http://www.shibuyi.net ============================================...

Cocos2d-x 3.0 开发(十六)cocos2dx-3.0beta版建立新项目并加载CocoStudio导出文件

1、概述     与alpah版相比,beta版中更改了创建的脚本,可以自定义项目的目录,接下来我们看看。先上图: 2、项目创建     找到 cocos2dx根目录/t...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:兼容storm(beta版)
举报原因:
原因补充:

(最多只允许输入30个字)