flink on yarn模式下flinksql使用hive_catalog

Ricardo_N

于 2023-02-01 12:05:43 发布

阅读量379

点赞数

分类专栏： skill 文章标签： hive flink 大数据

本文链接：https://blog.csdn.net/Ricardo_N/article/details/128831344

版权

skill 专栏收录该内容

19 篇文章 1 订阅

订阅专栏

flink on yarn模式下flinksql使用hive_catalog

问题描述：现有阶段中使用flinkcdc，在命令行界面进行开发，使用默认catalog和database,所有建的表都是基于内存，仅在当前会话窗口有效，关闭连接后需要重新建表，对于开发来说重复劳动不太友好，所以后续使用hive catalog，将开发所需要的表持久化到hive中存储

解决步骤

step 1
根据相关版本添加jar包，我的flink1.16+hive2.3.9版本，添加如下jar包
hive-exec-2.3.9.jar
flink-connector-hive_2.12-1.16.0.jar
flink-sql-connector-hive-2.3.9_2.12-1.16.0.jar
mysql-connector-java-8.0.19.jar

可以根据需求下载依赖
常用网址：https://repo.maven.apache.org/maven2/org/apache/flink/
https://mvnrepository.com/

step 2
在sql-client中创建hive 的catalog
CREATE CATALOG flink_hive
WITH (
‘type’ = ‘hive’,
‘hive-conf-dir’ = ‘/usr/local/service/hive/conf’
);

USE CATALOG flink_hive;

hive中创建一个新库用于存放或使用默认库都行

SET ‘execution.runtime-mode’ = ‘streaming’;
SET ‘sql-client.execution.result-mode’ = ‘tableau’;
SET ‘sql-client.execution.max-table-result.rows’ = ‘10000’;
SET ‘table.exec.state.ttl’ = ‘24h’;

step 3
配置完后可以将通用的sql放在一个初始的sql文件中

CREATE CATALOG flink_hive
  WITH (
    'type' = 'hive',
    'hive-conf-dir' = '/usr/local/service/hive/conf'
  );

USE CATALOG flink_hive;

USE test;

SET 'execution.runtime-mode' = 'streaming';
SET 'sql-client.execution.result-mode' = 'tableau';
SET 'sql-client.execution.max-table-result.rows' = '10000';
SET 'table.exec.state.ttl' = '24h';