测试环境:flink版本【flink-1.10.0】 ,kafka版本【kafka_2.10-0.10.0.0】
环境准备,请移步云邪大神的文章:Demo:基于 Flink SQL 构建流式应用
直接上代码
CREATE TABLE table_kafka (
user_id BIGINT,
item_id BIGINT,
category_id BIGINT,
behavior STRING,
ts TIMESTAMP(3),
proctime as PROCTIME(),
WATERMARK FOR ts as ts - INTERVAL '5' SECOND
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'flink_im02',
'connector.properties.group.id' = 'flink_im02_new',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.zookeeper.connect' = 'localhost:2181',
'connector.properties.bootstrap.servers' = 'localhost:9092',
'format.type' = 'csv',
'format.field-delimiter' = '|'
);
测试数据格式:
952483|310884|4580532|pv|2017-11-27 00:00:00
794777|5119439|982926|pv|2017-11-27 00:00:00
...
这里特别提示两个坑:
1、kafka消息数据的格式及字段类型必须要和建表字段结构一致
2、需要添加flink-csv-1.10.0.jar包,否则kafka接收数据源格式为csv格式会报错
wget -P ./lib/ https://repo1.maven.org/maven2/org/apache/flink/flink-csv/1.10.0/flink-csv-1.10.0.jar