分享
说明
- 本博客每周五更新一次。
- 实时计算处理中,kafka是重要的分布式消息队列,常作为 Flink 计算的输入和输出,本博客将使用 Flink 1.2实现 kafka 对数据的输入和输出操作。
资料
- 官方Flink 1.12 Table API&SQL kafka操作文档地址
过程
- 从kafka:input_kafka主题消费数据并生成Table,然后过滤状态为success的数据再写回到kafka:outpu_kafak主题
代码
- 代码开发基于java1.8+flink1.12,kafka等环境搭建过程参照前几篇博客。
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.TableResult;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
public class KafkaDeamo {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env,settings);
TableResult inputTable = tableEnv.executeSql("CREATE TABLE input_kafka (\n"
+ " `user_id` BIGINT,\n"
+ " `page_id` BIGINT,\n"
+ " `status` STRING\n"
+ " ) WITH (\n"
+ " 'connector' = 'kafka',\n"
+ " 'topic' = 'input_kafka',\n"
+ " 'properties.bootstrap.servers' = 'localhost:9092',\n"
+ " 'properties.group.id' = 'testGroup',\n"
+ " 'scan.startup.mode' = 'latest-offset',\n"
+ " 'format' = 'json'\n"
+ ")"
);
String sql="select * from input_kafka where status='success'";
Table tableResult=tableEnv.sqlQuery(sql);
DataStream<Tuple2<Boolean, Row>> resultDs=tableEnv.toRetractStream(tableResult, Row.class);
resultDs.print();
TableResult outputTable = tableEnv.executeSql("CREATE TABLE output_kafka (\n"
+ " `user_id` BIGINT,\n"
+ " `page_id` BIGINT,\n"
+ " `status` STRING\n"
+ " ) WITH (\n"
+ " 'connector' = 'kafka',\n"
+ " 'topic' = 'output_kafka',\n"
+ " 'properties.bootstrap.servers' = 'localhost:9092',\n"
+ " 'format' = 'json',\n"
+ " 'sink.partitioner'= 'round-robin'\n"
+ ")"
);
tableEnv.executeSql("insert into output_kafka select * from "+tableResult);
env.execute("");
}
}
kafka操作
- 创建输入和输出topic
- inputkafka:bin\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic input_kafka
- outputkafka:bin\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic output_kafka
- 进入输入topic:kafka-console-producer.bat --broker-list localhost:9092 --topic input_kafka
- 添加数据:{“user_id”:“1”,“page_id”:“1”,“status”:“success”}
- 打印输出topic:kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic output_kafka --from-beginning
总结
- Table API 定制型强,但接口变动快,适配性查,如果要满足版本更替,建议使用 SQL 开发。
- 选择好方向,争取成为某领域专家。