一.三种Environment
1.getExecutionEnvironment
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val env1: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
2.createLocalEnvironment
val env2: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment(1)
3. createRemoteEnvironment用的很少
二. 四种数据源
1.集合数据源
val CollectionDS: DataStream[String] = env.fromCollection(List("hadoop","spark","flink","hive"))
2.文件数据源
val fileDS: DataStream[String] = env.readTextFile("D:\\ideaProject\\flink-base\\test.txt")
3.Kafka数据源
val prop= new Properties()
prop.setProperty("bootstrap.servers", "node01:9092,node01:9092,node01:9092")
prop.setProperty("group.id", "consumer-group")
prop.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.setProperty("auto.offset.reset", "latest")
val kafkaDS: DataStream[String] = env.addSource(new FlinkKafkaConsumer011[String]("sensor",new SimpleStringSchema(),prop))
4.自定数据源
class mySensorSource extends SourceFunction[String]{
override def run(sourceContext: SourceFunction.SourceContext[String]): Unit = {
//生产数据处
}
override def cancel(): Unit = {//停止数据处
}
}
三 转换算子
1.map
var envs= fileDS.map{ x=>x.split(" ") (x,1) }.print()
2.flatMap
var envs= fileDS.flatMap{
x=>x.split(" ").map((_,1))
}.print()
3.Filter 过滤
var envs= fileDS.flatMap{ x=>x.split(" ") .filter(t=>t.equals("hello")) }.print()
4 .KeyBy 将流拆分不相交的分区,每个分区含有相同的key的元素
var envs= fileDS.flatMap{
x=>x.split(" ").filter(t=>t .equals("kylin"))
.map((_,1))
}.keyBy(0).sum(1).print()
5.两个数据流被Connect之后,只是被放在了一个同一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立
val warning = high.map( sensorData => (sensorData.id, sensorData.temperature) )
val connected = warning.connect(low)
6.CoMap,CoFlatMap 功能与map和flatMap一样,对ConnectedStreams中的每一个Stream分别进行map和flatMap处理
val coMap = connected.map(
warningData => (warningData._1, warningData._2, "warning"),
lowData => (lowData.id, "healthy")
)
7.Union 对两个或者两个以上的DataStream进行union操作,产生一个包含所有DataStream元素的新DataStream
val fileDS: DataStream[String] = env.readTextFile("D:\\ideaProject\\flink-base\\test.txt")
val fileDS1: DataStream[String] = env.readTextFile("D:\\ideaProject\\flink-base\\test.txt")
fileDS.union(fileDS1).print()
Union之前两个流的类型必须是一样,Connect可以不一样,在之后的coMap中再去调整成为一样的。
Connect只能操作两个流,Union可以操作多个。
四.Sink 落地
- Kafka落地
fileDS1.addSink(new FlinkKafkaProducer011[String]("localhost:9092", "test", new SimpleStringSchema()))
2. Redis 落地
class MyRedisMapper extends RedisMapper[String,String]{
override def getCommandDescription: RedisCommandDescription = {
new RedisCommandDescription(RedisCommand.HSET,"sensor")
}
override def getKeyFromData(t: String): String = ???
override def getValueFromData(t: String): String = ???
}
主函数调用
val conf = new FlinkJedisPoolConfig.Builder().setHost("localhost").setPort(6379).build()
dataStream.addSink( new RedisSink[SensorReading](conf, new MyRedisMapper) )
3.定义下沉地
class MyJdbcSink() extends RichSinkFunction[SensorReading]{
var conn: Connection = _
var insertStmt: PreparedStatement = _
var updateStmt: PreparedStatement = _
// open 主要是创建连接
override def open(parameters: Configuration): Unit = {
super.open(parameters)
conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test", "root", "123456")
insertStmt = conn.prepareStatement("INSERT INTO temperatures (sensor, temp) VALUES (?, ?)")
updateStmt = conn.prepareStatement("UPDATE temperatures SET temp = ? WHERE sensor = ?")
}
// 调用连接,执行sql
override def invoke(value: SensorReading, context: SinkFunction.Context[_]): Unit = {
updateStmt.setDouble(1, value.temperature)
updateStmt.setString(2, value.id)
updateStmt.execute()
if (updateStmt.getUpdateCount == 0) {
insertStmt.setString(1, value.id)
insertStmt.setDouble(2, value.temperature)
insertStmt.execute()
}
}
override def close(): Unit = {
insertStmt.close()
updateStmt.close()
conn.close()
}
}
141

被折叠的 条评论
为什么被折叠?



