0 说明
Flink没有类似于spark中foreach方法,让用户进行迭代的操作。虽有对外的输出操作都要利用Sink完成。最后通过类似如下方式完成整个任务最终输出操作。
stream.addSink(new MySink(xxxx))
官方提供了一部分的框架的sink。除此以外,需要用户自定义实现sink
1 kafka
1.1 添加依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>1.10.0</version>
</dependency>
1.2 scala代码
import java.util.Properties
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer011, FlinkKafkaProducer011}
object kafkaSinkTest {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val dataStream: DataStream[String] = env.readTextFile("E:\\bigdata\\Flink2\\src\\main\\resources\\sensor.txt")
env.setParallelism(1)
//设置kafka配置参数
val properties = new Properties()
properties.setProperty("bootstrap.servers", "192.168.199.101:9092")
properties.setProperty("group.id", "bigdata")
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("auto.offset.reset", "latest")
//设置kafka消费者
val inputStream: DataStream[String] = env.addSource(new FlinkKafkaConsumer011[String]("sensor", new SimpleStringSchema(), properties))
val resultStream: DataStream[String] = inputStream.map(
streamData => {
val strings = streamData.split(",")
SensorReading(strings(0), strings(1).toLong, strings(2).toDouble).toString
})
resultStream.print("result")
env.execute("kafka sink test")
}
}
1.3 启动kafka集群
bin/kafka-server-start.sh -daemon config/server.properties
1.4 创建主题
bin/kafka-topics.sh --zookeeper master-1:2181,master-2:2181,slave-1:2181 --create --replication-factor 3 --partitions 2 --topic sensor
1.5 启动生产者
1.6 启动程序
可以看到flink将从kafka消费到的数据进行处理并输出
接下来,我们从kafka中读取数据,在flink中进行处理,并写回到kafka中
//设置生产者,将处理好的数据写回kafka
resultStream.addSink(new FlinkKafkaProducer011[String]("192.168.199.101:9092","sinktest2",new SimpleStringSchema()))
env.execute("kafka sink test")
我们处理的数据将string转换为long再+1
启动kafka生产者,sensor开启数据消费
可以看到当前数据+1
启动kafka集群钟的消费者,消费sintest 主题
可以看到当前经过flink处理之后的数据又重新写入到了kafka中
2 JDBC Sink
2.1 导入依赖
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.44</version>
</dependency>
2.2 添加MyJdbcSink
在mysql数据库中创建temperatures表
import java.sql.{Connection, DriverManager, PreparedStatement}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction}
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.scala._
object MyJDBCSink {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val dataStream: DataStream[String] = env.readTextFile("E:\\bigdata\\Flink2\\src\\main\\resources\\sensor.txt")
val resultStream: DataStream[SensorReading] = dataStream.map(
line => {
val strings = line.split(",")
SensorReading(strings(0), strings(1).toLong, strings(2).toDouble)
}
)
resultStream.addSink(new MyJdcSink())
env.execute("MyJDBCSink test")
}
}
class MyJdcSink() extends RichSinkFunction[SensorReading] {
var conn: Connection = _
var insertStmt: PreparedStatement = _
var updateStmt: PreparedStatement = _
//链接前做的工作
override def open(parameters: Configuration): Unit = {
super.open(parameters)
conn = DriverManager.getConnection("jdbc:mysql://192.168.199.101:3306/test", "root", "root")
insertStmt = conn.prepareStatement("INSERT INTO temperatures (sensor, temp) VALUES (?, ?)")
updateStmt = conn.prepareStatement("UPDATE temperatures SET temp = ? WHERE sensor = ?")
}
//链接成功后需要做的工作
override def invoke(value: SensorReading, context: SinkFunction.Context[_]): Unit = {
updateStmt.setString(1, value.id)
updateStmt.setDouble(2, value.temperature)
updateStmt.execute()
if (updateStmt.getUpdateCount == 0) {
insertStmt.setString(1, value.id)
insertStmt.setDouble(2, value.temperature)
insertStmt.execute()
}
}
//关闭链接后的操作
override def close(): Unit = {
updateStmt.close()
insertStmt.close()
conn.close()
}
}
运行程序,将数据写入mysql中
执行程序时遇到bug:No suite driver found for jdbc:mysql:😕/xxx
原因:未导入mysql的jar包,在项目中导入对应的mysqljar包即可。注意这里要选择Add as Library,要前面的jar包出现三角号后才表示成功导入该jar包