flink on yarn模式下flinksql使用hive_catalog
问题描述:现有阶段中使用flinkcdc,在命令行界面进行开发,使用默认catalog和database,所有建的表都是基于内存,仅在当前会话窗口有效,关闭连接后需要重新建表,对于开发来说重复劳动不太友好,所以后续使用hive catalog,将开发所需要的表持久化到hive中存储
解决步骤
step 1
根据相关版本添加jar包,我的flink1.16+hive2.3.9版本,添加如下jar包
hive-exec-2.3.9.jar
flink-connector-hive_2.12-1.16.0.jar
flink-sql-connector-hive-2.3.9_2.12-1.16.0.jar
mysql-connector-java-8.0.19.jar
可以根据需求下载依赖
常用网址:https://repo.maven.apache.org/maven2/org/apache/flink/
https://mvnrepository.com/
step 2
在sql-client中创建hive 的catalog
CREATE CATALOG flink_hive
WITH (
‘type’ = ‘hive’,
‘hive-conf-dir’ = ‘/usr/local/service/hive/conf’
);
USE CATALOG flink_hive;
hive中创建一个新库用于存放或使用默认库都行
SET ‘execution.runtime-mode’ = ‘streaming’;
SET ‘sql-client.execution.result-mode’ = ‘tableau’;
SET ‘sql-client.execution.max-table-result.rows’ = ‘10000’;
SET ‘table.exec.state.ttl’ = ‘24h’;
step 3
配置完后可以将通用的sql放在一个初始的sql文件中
CREATE CATALOG flink_hive
WITH (
'type' = 'hive',
'hive-conf-dir' = '/usr/local/service/hive/conf'
);
USE CATALOG flink_hive;
USE test;
SET 'execution.runtime-mode' = 'streaming';
SET 'sql-client.execution.result-mode' = 'tableau';
SET 'sql-client.execution.max-table-result.rows' = '10000';
SET 'table.exec.state.ttl' = '24h';
将这一段放在一个sql中,后续执行sql-client.sh embedded -s xxx -i ./flink-hive.sql
也可使用alias 将sql-client.sh embedded -s xxx -i ./flink-hive.sql起别名
结束