Flink消费Kafka数据,对消费的数据做实时计算是实时计算领域应用最为广泛的场景。下面介绍下如何使用Flink对接Kafka,并成功消费数据(考虑到计算程序复杂性本文以WordCount为例)做实时计算,
1.首先是pom文件,这里的Flink,Kafka版本要与集群中安装的一致。
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
2.其次编写Scala代码,构建Flink从Source–>Tansformation–>Sink的标准流;
object KafkaOtherWriteTS {
//连接Kafka集群需要的配置信息
private val ZOOKEEPER_HOST = "10.10.0.104:2181,10.10.0.111:2181,10.10.0.116:2181"
private val KAFKA_BROKER = "10.10.0.104:9092,10.10.0.111:9092,10.10.0.116:9092"
def main(args: Array[String]): Unit = {
//1.创建Flink程序的流式计算环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.enableCheckpointing(5000)
env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
val kafkaProps: Properties = new Properties()
kafkaProps.setProperty("bootstrap.servers",KAFKA_BROKER)
kafkaProps.setProperty("zookeeper.connect", ZOOKEEPER_HOST)
kafkaProps.setProperty("group.id","test")
kafkaProps.setProperty("auto.offset.reset","earliest")
//2.根据Flink-conncetor中的FlinkKafkaConsumer011类产生数据源Source
val consumer = new FlinkKafkaConsumer011[String]("topic_test1",new SimpleStringSchema,kafkaProps)
val transaction: DataStream[String] = env.addSource(consumer)
//3.对数据源的数据进行Transformation计算
val result = transaction.flatMap(_.split(" "))
.filter(_ != null)
.map((_,1))
.keyBy(0)
.sum(1)
//4.添加Sink为打印到控制台方式
result.print()
//5.触发程序开始计算
env.execute(this.getClass.getSimpleName)
}
}
3.再Kafka客户端启动Kafka生产者,用来生产数据给Flink消费(首先要启动Zookeeper和Kafka服务);
bin/kafka-console-producer.sh --broker-list node4:9092 --topic topic_test1
4.本文直接将计算实时计算结果打印在控制台,可以看出在Kafka端进行数据写入的时候,控制台都会累计计算结果,做到有状态的实时计算。
5.至此,可以看出Flink已成功对接Kafka,并将其topic内容消费做有状态的计算,当然计算的结果也可以写到Mysql,Redis等数据库中做后续指标分析。