Kafka作为Flume 的 Channel,将数据保存到topic中,Flink作为Kafka的消费者,消费topic中的数据,实现实时数据的分析。
Flink 程序:
import org.apache.flink.api.common.serialization.SimpleStringSchema import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation} import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer import java.util.Properties /** * DATE:2022/10/3 21:49 * AUTHOR:GX */ object SourceKafkaTest { def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) //保存Kafka连接的相关配置 val properties = new Properties() properties.setProperty("bootstrap.servers","master:9092") properties.setProperty("group.id","consumer-group") val stream = env.addSource(new FlinkKafkaConsumer[String]("clicks", new SimpleStringSchema(), properties)) stream.print() env.execute() } }
Kafka作为Flume的Channels
采集方案:
a1.sources=s1
a1.channels=c1
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /opt/flinkDemo/data/logs/
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers=master:9092
a1.channels.c1.kafka.topic=clicks
a1.sources.s1.channels=c1
定时向文件中插入数据(模拟日志文件的生成,向指定文件中插入当前时间戳)
while true;do echo $(date "+%Y%m%d%H%M%S") >> /opt/flinkDemo/data/logs/logs.log;sleep 0.5;done
可以创建一个生产者,手动向topic中插入数据:
bin/kafka-console-producer.sh --bootstrap-server master:9092 --topic clicks
打开消费者,查看是否有数据实时产生:
bin/kafka-console-consumer.sh --bootstrap-server master:9092 --topic clicks