Flink——DataStream API

关于Flink程序的开发流程和具体案例请参考:Flink——从零搭建Flink应用

DataSource

Datasource用于Flink程序读取数据,可通过:StreamExecutionEnvironment.进行配置。

内置数据源

  • 文件数据源:
    • readTextFile(path):直接读取文本文件;
    • readFile(fileInputFormat, path):读取指定类型的文件;
    • readFile(fileInputFormat, path, watchType, interval, pathFilter):可指定读取文件的类型、检测文件变换的时间间隔、文件路径过滤条件等。watchType分为两种模式:
      • PROCESS_CONTINUOUSLY:一旦检测到文件变化,会将改文件全部内容加载到Flink。该模式无法实现Excatly Once
      • PROCESS_ONCE:一旦检测到文件变化,只会将变化的数据加载到Flink。该模式无法实现Excatly Once
  • socket数据源:
    • socketTextStream(hostname, port):从Socket端口传入数据;
  • 集合数据源:
    • fromCollection(Seq)
    • fromCollection(Iterator)
    • fromElements(elements: _*)
    • fromParallelCollection(SplittableIterator)
    • generateSequence(from, to)

外部数据源

对于流式计算类型的应用,数据大部分都是从外部第三方系统中获取,为此,Flink通过实现SourceFunction定义了丰富的第三方数据连接器(支持自定义数据源):

DataStream Transformations

OperatorTransformationExample
mapDataStream → DataStreamdataStream.map { x => x * 2 }
flatMapDataStream → DataStreamdataStream.flatMap { str => str.split(" ") }
filterDataStream → DataStreamdataStream.filter { _ != 0 }
keyByDataStream → KeyedStreamdataStream.keyBy(“someKey”) // Key by field “someKey”
dataStream.keyBy(0) // Key by the first element of a Tuple
reduceKeyedStream → DataStreamkeyedStream.reduce { _ + _ }
foldKeyedStream → DataStreamval result: DataStream[String] = keyedStream.fold(“start”)((str, i) => { str + “-” + i })
aggregationsKeyedStream → DataStreamkeyedStream.sum(0)
keyedStream.sum(“key”)
keyedStream.min(0)
keyedStream.min(“key”)
keyedStream.max(0)
keyedStream.max(“key”)
keyedStream.minBy(0)
keyedStream.minBy(“key”)
keyedStream.maxBy(0)
keyedStream.maxBy(“key”)
windowKeyedStream → WindowedStreamdataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5))) // Last 5 seconds of data
windowAllDataStream → AllWindowedStreamdataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))) // Last 5 seconds of data
Window ApplyWindowedStream → DataStream AllWindowedStream → DataStreamwindowedStream.apply { WindowFunction }
// applying an AllWindowFunction on non-keyed window stream
allWindowedStream.apply { AllWindowFunction }
Window ReduceWindowedStream → DataStreamwindowedStream.reduce { _ + _ }
Window FoldWindowedStream → DataStreamval result: DataStream[String] = windowedStream.fold(“start”, (str, i) => { str + “-” + i })
Aggregations on windowsWindowedStream → DataStreamwindowedStream.sum(0)
windowedStream.sum(“key”)
windowedStream.min(0)
windowedStream.min(“key”)
windowedStream.max(0)
windowedStream.max(“key”)
windowedStream.minBy(0)
windowedStream.minBy(“key”)
windowedStream.maxBy(0)
windowedStream.maxBy(“key”)
unionDataStream* → DataStreamdataStream.union(otherStream1, otherStream2, …)
Window JoinDataStream,DataStream → DataStreamdataStream.join(otherStream)
.where().equalTo()
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply { … }
Window CoGroupDataStream,DataStream → DataStreamdataStream.coGroup(otherStream)
.where(0).equalTo(1)
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply {}
connectDataStream,DataStream → ConnectedStreamssomeStream : DataStream[Int] = …
otherStream : DataStream[String] = …
val connectedStreams = someStream.connect(otherStream)
CoMap, CoFlatMapConnectedStreams → DataStreamconnectedStreams.map( (_ : Int) => true, (_ : String) => false ) connectedStreams.flatMap( (_ : Int) => true, (_ : String) => false )
splitDataStream → SplitStreamval split = someDataStream.split( (num: Int) => (num % 2) match { case 0 => List(“even”) case 1 => List(“odd”) } )
selectSplitStream → DataStreamval even = split select “even” val odd = split select “odd” val all = split.select(“even”,“odd”)
iterateDataStream → IterativeStream → DataStreaminitialStream.iterate { iteration => { val iterationBody = iteration.map {/do something/} (iterationBody.filter(_ > 0), iterationBody.filter(_ <= 0)) } }
Extract TimestampsDataStream → DataStreamstream.assignTimestamps { timestampExtractor }

DataSink

经过各种数据转换操作之后,形成最终结果数据集。通常情况下,需要将结果输出在外部存储介质或者传输到下游的消息中间件内,在Flink中将DataStream数据输出到外部系统的过程被定义为DataSink操作。可通过:StreamExecutionEnvironment.进行配置。

内置数据源

  • writeAsText() / TextOutputFormat
  • writeAsCsv(...) / CsvOutputFormat
  • print() / printToErr()
  • writeUsingOutputFormat() / FileOutputFormat
  • writeToSocket

外部数据源

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值