1、简单转化算子(map,flatmap,filter这些)datastream和keyedStream都可以有,但是datastream没有聚合算子,只有keyedStream才有。键值转换后的才有滚动聚合算子sum(),min(),max() ,minBy(),maxBy(),reduce()
2、键值转换后的才有滚动聚合算子sum(),min(),max() ,minBy(),maxBy()
min是来一条数据就和历史数据比较,然后输出最小值,格式还是SensorReading类型,如果取min(temperatrue)则会输出temperature最小值,其他字段是keyBy的第一条记录
minBy是来一条数据就和历史数据比较,然后输出最小值,格式还是SensorReading类型,如果取minby(temperatrue)则会输出temperature最小值,其他字段是最小值字段的当条记录
sum是来一条数据就和历史数据相加,然后输出汇总值
3、reduce方法,需求是获得temperature的最小值,但是timeStamp的最大值,reduce方法内的lambda的第一个值是之前的聚合后的结果,第二个值是最新值,带状态的
4、connectStream的map方法内部是comap,是对合并的多个流分别去做处理,connect两个流的数据类型可以不一致
package flinkSourse
import org.apache.flink.api.common.functions.{MapFunction, ReduceFunction, RichMapFunction}
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.co.CoMapFunction
import org.apache.flink.streaming.api.scala.{ConnectedStreams, _}
//简单转化算子(map,flatmap,filter这些)datastream和keyedStream都可以有,但是datastream没有聚合算子,只有keyedStream才有,
// 键值转换后的才有滚动聚合算子sum(),min(),max() ,minBy(),maxBy(),reduce()
// 多流转换 split、select、connect、comap、union
object FlinkTransform {
def main(args: Array[String]): Unit = {
val executionEnvironment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
executionEnvironment.setParallelism(1)
// 有界流 env.readTextFile
val stream2: DataStream[String] = executionEnvironment.readTextFile("src/main/resources/sensorReading.txt")
//只有keyby后的keyedStream才能进行min、max、sum等聚合操作
//1、简单转化 flatmap
// stream2.flatMap(data => data.split(",")).print()
//2、键值转换后的才有滚动聚合算子sum(),min(),max() ,minBy(),maxBy()
//min是来一条数据就和历史数据比较,然后输出最小值,格式还是SensorReading类型,如果取min(temperatrue)则会输出temperature最小值,其他字段是keyBy的第一条记录
//minBy是来一条数据就和历史数据比较,然后输出最小值,格式还是SensorReading类型,如果取minby(temperatrue)则会输出temperature最小值,其他字段是最小值字段的当条记录
//sum是来一条数据就和历史数据相加,然后输出汇总值
val transforStream: DataStream[SensorReading] = stream2.map(data => {
val tmpList: Array[String] = data.split(",")
SensorReading(tmpList(0), tmpList(1).toLong, tmpList(2).toDouble)
})
val keyedStream: KeyedStream[SensorReading, Tuple] = transforStream.keyBy("id")
// keyedStream.minBy("temperature").print()
//3、reduce方法,要求获得temperature的最小值,但是timeStamp的最大值
//reduce方法内的lambda的第一个值是之前的聚合后的结果,第二个值是最新值
//第一种lambda表达式
// keyedStream.reduce((valueState,newData)=>{
// SensorReading(valueState.id,newData.timestamp,valueState.temperature.min(newData.temperature))
// }).print()
//第二种自定义类的方式,scala需要用class去实现java的interface接口,而不是继承
// keyedStream.reduce(new MyReduceFunction()).print()
// keyedStream状态流 不能用aggregate,因为是private def aggregate 的
//4.1、分流操作 将传感器的的流按照温度分为两个流.split方法相当于给盖了戳,select方法进行选取
val splitStream: SplitStream[SensorReading] = transforStream.split((data) => {
if (data.temperature >= 32) Seq("high") else Seq("low")
})
val highStream: DataStream[SensorReading] = splitStream.select("high")
val lowStream: DataStream[SensorReading] = splitStream.select("low")
val allStream: DataStream[SensorReading] = splitStream.select("high", "low")
//4.2 connectStream的map方法内部是comap,是对合并的多个流分别去做处理,connect两个流的数据类型可以不一致
val warningStream: DataStream[(String, Double)] = highStream.map(data => (data.id, data.temperature))
val connectedStreams: ConnectedStreams[(String, Double), SensorReading] = warningStream.connect(lowStream)
val comapDataStream: DataStream[Product] = connectedStreams.map(new MyCoMapFunction("warn you"))
// comapDataStream.print("comapDataStream")
//4.3 union 两个流的数据类型一致
val unionStream: DataStream[SensorReading] = highStream.union(lowStream)
unionStream.print("union")
executionEnvironment.execute("transform")
}
}
class MyReduceFunction extends ReduceFunction[SensorReading] {
override def reduce(t: SensorReading, t1: SensorReading): SensorReading = {
SensorReading(t.id, t1.timestamp, t.temperature.min(t1.temperature))
}
}
//可以传进来构造方法的参数的,也可以不传
class MyCoMapFunction(val info: String) extends CoMapFunction[(String, Double), SensorReading, Product] {
override def map1(in1: (String, Double)): Tuple3[String,Double,String] = {
(in1._1, in1._2, info)
}
override def map2(in2: SensorReading): Tuple2[String,String] = {
(in2.id, "healthy")
}
}
//richfunction多了一些生命周期 和 运行时环境,可以从运行时环境上下文得到state的信息
class MyRichMapFunction extends RichMapFunction[SensorReading, String] {
override def open(parameters: Configuration): Unit = {
//做一些数据库初始化链接等操作,只执行一次
// val value: ListState[Nothing] = getRuntimeContext.getListState()
}
override def close(): Unit = {
//关闭连接等操作
}
override def map(in: SensorReading): String = {
in.id + in.temperature + in.timestamp
}
}
class MyMapFunction extends MapFunction[SensorReading, String] {
override def map(t: SensorReading): String = {"D"}
}