二. 使用Flink connector kafka 映射kafka topic
六. 在kafka的producer的发送数据,然后在客户端进行hudi表数据查询,发现数据都过来了.
7.4 使用 hudi connector 创建hudi DWD表
一. 启动kafka生产者造数据
bin/kafka-console-producer.sh --broker-list node1:9092 --topic t_kafka_03
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:12:12"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:15:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:20:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:30:00"}
{"user_id":"a1111","order_amount":13.0,"log_ts":"2020-06-29 12:32:00"}
{"user_id":"a1112","order_amount":15.0,"log_ts":"2020-11-26 12:12:13"}
注意:在任务跑起来以后,在不间断的发送一部分数据,测试
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:12:12"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:15:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:20:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:30:00"}
{"user_id":"a1111","order_amount":13.0,"log_ts":"2020-06-29 12:32:00"}
{"user_id":"a1112","order_amount":15.0,"log_ts":"2020-11-26 12:12:13"}
二. 使用Flink connector kafka 映射kafka topic
1.建表语句:
CREATE TABLE hudi_source(
user_id STRING,
order_amount BIGINT,
log_ts TIMESTAMP(3)
)WITH(
'connector' = 'kafka',
'topic' = 't_kafka_03',
'properties.bootstrap.servers' = 'node1:9092',
'scan.startup.mode'='earliest-offset',
'properties.group.id' = 'testGroup',
'format' = 'json'
);
2.查询语句,看一下表中的数据有没有映射过来
select * from hudi_source;
三. 使用 hudi connector 创建hudi表
CREATE TABLE hudi_source_hudi(
user_id STRING,
order_amount BIGINT,
log_ts TIMESTAMP(3)
) WITH (
'connector' = 'hudi',
'path' = 'hdfs://node1:8020/hudi/hudi_source_hudi',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'write.precombine.field' = 'log_ts',
'hoodie.datasource.write.recordkey.field' = 'user_id',
'compaction.async.enabled' = 'false'
);
注意:
1.'write.precombine.field' = 'log_ts',
这个参数必须设置,可以将对同一条数据的多次记录压缩为1条记录.
2.'hoodie.datasource.write.recordkey.field' = 'user_id',
和直接设置主键的作用的一样的
四. 将kafka表的数据写入到hudi表中
insert into hudi_source_hudi select * from hudi_source;
五. 查询hudi表数据
select * from hudi_source_hudi;
看到如下的数据,说明数据已经同步过来了.
六. 在kafka的producer的发送数据,然后在客户端进行hudi表数据查询,发现数据都过来了.
bin/kafka-console-producer.sh --broker-list node1:9092 --topic t_kafka_03
{"user_id":"a2222","order_amount":11.0,"log_ts":"2020-06-29 12:12:12"}
{"user_id":"a2222","order_amount":11.0,"log_ts":"2020-06-29 12:15:00"}
{"user_id":"a2222","order_amount":11.0,"log_ts":"2020-06-29 12:20:00"}
{"user_id":"a2222","order_amount":11.0,"log_ts":"2020-06-29 12:30:00"}
{"user_id":"a2222","order_amount":13.0,"log_ts":"2020-06-29 12:32:00"}
{"user_id":"a2222","order_amount":15.0,"log_ts":"2020-11-26 12:12:13"}
看到如下的数据,说明增量的数据已经同步过来了.
七. 在FlinkSQL客户端直接进行表关联
7.1 启动kafka生活者,生产数据
1.1 启动user生活者,生产数据
bin/kafka-console-producer.sh --broker-list node1:9092 --topic user
{"user_id":"a0001","order_amount":11.0,"log_ts":"2020-06-29 12:12:12"}
{"user_id":"a0002","order_amount":12.0,"log_ts":"2020-06-29 12:15:00"}
{"user_id":"a0003","order_amount":13.0,"log_ts":"2020-06-29 12:20:00"}
{"user_id":"a0004","order_amount":14.0,"log_ts":"2020-06-29 12:30:00"}
{"user_id":"a0005","order_amount":15.0,"log_ts":"2020-06-29 12:32:00"}
{"user_id":"a0006","order_amount":16.0,"log_ts":"2020-11-26 12:12:13"}
1.2 启动user_hobby生产者,生产数据
bin/kafka-console-producer.sh --broker-list node1:9092 --topic user_hobby
{"user_id":"a0001","name":"yangge","hobby":"足球"}
{"user_id":"a0002","name":"baba","hobby":"电影"}
{"user_id":"a0003","name":"mama","hobby":"游戏"}
{"user_id":"a0004","name":"dudu","hobby":"动画片"}
{"user_id":"a0005","name":"gege","hobby":"手机"}
{"user_id":"a0006","name":"jiejie","hobby":"睡觉"}
7.2 在Flink SQL客户端创建对应的映射表
2.1 在Flink SQL客户端创建user表
CREATE TABLE user_ODS(
user_id STRING,
order_amount BIGINT,
log_ts TIMESTAMP(3)
)WITH(
'connector' = 'kafka',
'topic' = 'user',
'properties.bootstrap.servers' = 'node1:9092',
'scan.startup.mode'='earliest-offset',
'properties.group.id' = 'testGroup',
'format' = 'json'
);
select * from user_ODS;
2.2 在flink SQL客户端创建user_hobby表
CREATE TABLE user_hobby_ODS(
user_id STRING,
name STRING,
hobby STRING
)WITH(
'connector' = 'kafka',
'topic' = 'user_hobby',
'properties.bootstrap.servers' = 'node1:9092',
'scan.startup.mode'='earliest-offset',
'properties.group.id' = 'testGroup',
'format' = 'json'
);
select * from user_hobby_ODS;
7.3 使用 hudi connector 创建hudi表
3.1 使用 hudi connector 创建hudi表
CREATE TABLE hudi_user(
user_id STRING,
order_amount BIGINT,
log_ts TIMESTAMP(3)
) WITH (
'connector' = 'hudi',
'path' = 'hdfs://node1:8020/hudi/hudi_user',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'write.precombine.field' = 'log_ts',
'hoodie.datasource.write.recordkey.field' = 'user_id',
'compaction.async.enabled' = 'false'
);
insert into hudi_user select * from user_ODS;
select * from hudi_user ;
select * from user_ODS;
3.2 使用 hudi connector 创建hudi表
CREATE TABLE hudi_user_hobby(
user_id STRING,
name STRING,
hobby STRING
) WITH (
'connector' = 'hudi',
'path' = 'hdfs://node1:8020/hudi/hudi_user_hobby',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'hoodie.datasource.write.recordkey.field' = 'user_id',
'write.precombine.field' = 'user_id',
'compaction.async.enabled' = 'false'
);
insert into hudi_user_hobby select * from user_hobby_ODS;
select * from hudi_user_hobby;
7.4 使用 hudi connector 创建hudi DWD表
4.1 在Flink SQL 创建DWD输出表
CREATE TABLE user_hobby_DWD(
user_id STRING,
name STRING,
hobby STRING,
order_amount BIGINT,
log_ts TIMESTAMP(3)
)WITH(
'connector' = 'hudi',
'path' = 'hdfs://node1:8020/hudi/user_hobby_DWD',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'hoodie.datasource.write.recordkey.field' = 'user_id',
'write.precombine.field' = 'user_id',
'compaction.async.enabled' = 'false'
);
insert into user_hobby_DWD
select
A.user_id,
B.name,
B.hobby,
A.order_amount,
A.log_ts
from
(select * from hudi_user) A
join
(select * from hudi_user_hobby) B
on A.user_id = B.user_id;
注意事项:字段的顺序和最终写入表的字段顺序必须一致,不一致会报错.
出现这样的结果,说明join完成.