自定义Source
使用自定义的Source具体可以用在以下两个场景:
- 系统以及开发好了,但是没有足够数量的数据
- 没有具体的实现方法到数据源去读取数据
此时就需要我们去实现一个自定义Sources
实现自定义Sources
需要:模拟温度传感器产生的数据
当所需sourceFunciton没有具体实现时,需要自定义:
class mySource() extends SourceFunction[SensorReading]{}
这里SourceFunction[T] 中传入的泛型为要发出参数的类型:
首先SourceFunction把这个泛型T传给 SourceContext [T],
然后再 SourceContext 的collet方法中发现 传入泛型T并把结果发送了出去
所以这里泛型设置为 SensorReading
因为SourceFunction是抽象类,所以要重写父类方法
cancel: 用于控制停止发送数据
run:生产或者读取数据 并通过SourceContext.collect发送出去。
Cancel
定义一个flag用于标识是否发送数据,然后当调用cancel就使其false
var running:Boolean = true
override def cancel(): Unit = running = false
Run
这里首先产生十个温度传感器的初始值
val random = new Random()
//获取初始数据
val originData = 1.to(10).map(data => ("Sensor_"+data,random.nextDouble() * 100))
然后再初始值上加上高思函数偏移量用于模拟真实环境
val mapedData = originData.map(data => (data._1,data._2 + random.nextGaussian()))
然后保证成SensorReading类型发送出去
//获取时间戳
val timeStamp = System.currentTimeMillis()
//发出数据
mapedData.foreach(data =>sourceContext.collect(SensorReading(data._1,timeStamp,data._2)))
因为需要源源不断发送,所以添加循环
override def run(sourceContext: SourceFunction.SourceContext[SensorReading]): Unit = {
val random = new Random()
//获取初始数据
val originData = 1.to(10).map(data => ("Sensor_"+data,random.nextDouble() * 100))
while(running){
//添加偏移量
val mapedData = originData.map(data => (data._1,data._2 + random.nextGaussian()))
//获取时间戳
val timeStamp = System.currentTimeMillis()
//发出数据
mapedData.foreach(data =>sourceContext.collect(SensorReading(data._1,timeStamp,data._2)))
//添加睡眠时间
Thread.sleep(100)
}
完整代码
package com.erke
import org.apache.flink.streaming.api.functions.source.SourceFunction
import org.apache.flink.streaming.api.scala._
import scala.util.Random
case class SensorReading(id:String,timeStamp:Long,temperature:Double)
class mySource() extends SourceFunction[SensorReading]{
var running:Boolean = true
override def cancel(): Unit = running = false
override def run(sourceContext: SourceFunction.SourceContext[SensorReading]): Unit = {
val random = new Random()
//获取初始数据
val originData = 1.to(10).map(data => ("Sensor_"+data,random.nextDouble() * 100))
while(running){
//添加偏移量
val mapedData = originData.map(data => (data._1,data._2 + random.nextGaussian()))
//获取时间戳
val timeStamp = System.currentTimeMillis()
//发出数据
mapedData.foreach(data =>sourceContext.collect(SensorReading(data._1,timeStamp,data._2)))
//添加睡眠时间
Thread.sleep(100)
}
}
}
object soureApiTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val mySouceDataStream = env.addSource(new mySource())
mySouceDataStream.print()
env.execute("SourceTest")
}
}