写在前面
本文记录了一次基于CDH-6.1.0实现Flink-1.9.1集成读写Hive的过程,介绍了Flink自带sql-client连接Hive的方式以及java实现连接Hive的小demo,但是因为版本的原因没有执行成功,目前考虑其他方式对接Hive,所以方法仅供参考。
需要提醒的是Flink在1.9.x版本才提供集成读写Hive的功能,且是beta版,Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的对应Hive版本(起码大版本号要对应,否则出现方法找不到等错误)。
配置Flink-1.9.1使用Hive-2.3.4
首先修改flink-1.9.1/conf/sql-client-defaults.yaml配置,为hive配置catalog相关参数,cdh版本的hive-conf目录为:/etc/hive/conf.cloudera.hive。
[root@node01 lib]# vi /opt/flink-1.9.1/conf/sql-client-defaults.yaml
...
#==============================================================================
# Catalogs
#==============================================================================
# Define catalogs here.
#catalogs: [] # empty list
# A typical catalog definition looks like:
# - name: myhive
# type: hive
# hive-conf-dir: /opt/hive_conf/
# default-database: ...
catalogs:
- name: myhive
type: hive
property-version: 1
hive-conf-dir: /etc/hive/conf.cloudera.hive
hive-version: 2.3.4
执行bin/sql-client.sh embedded启动sql-client,第一次报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...
Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
... 2 more
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.CatalogFactory' in
the classpath.
Reason: No context matches.
The following properties are requested:
hive-conf-dir=/etc/hive/conf.cloudera.hive
hive-version=2.3.4
property-version=1
type=hive
The following factories have been considered:
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
at org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
at org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
at org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:114)
at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:258)
at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
at java.util.HashMap.forEach(HashMap.java:1289)
at org.apache.flink.table.client.gateway.local.ExecutionContext.<init>(ExecutionContext.java:135)
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
... 4 more
加入jar:
{flink-home}/flink-connectors/flink-connector-hive/target/flink-connector-hive_2.11-1.9.1.jar
{flink-home}/flink-connectors/flink-hadoop-compatibility/target/flink-hadoop-compatibility_2.11-1.9.1.jar
运行再次报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...
Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
... 2 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/common/util/HiveVersionInfo
at org.apache.flink.table.catalog.hive.client.HiveShimLoader.getHiveVersion(HiveShimLoader.java:58)
at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:82)
at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
at java.util.HashMap.forEach(HashMap.java:1289)
at org.apache.flink.table.client.gateway.local.ExecutionContext.<init>(ExecutionContext.java:135)
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
... 4 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.common.util.HiveVersionInfo
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
报错显示缺少Hive相关的jar包,sql-client的jar管理直接放在{flink-home}/lib下,且Hive的版本支持2.3.4和1.2.1,我的Hive版本:Hive 2.1.1-cdh6.1.0,根据版本最近选择2.3.4,下载Hive-2.3.4的安装包:http://archive.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz(Hive-1.2.1:http://archive.apache.org/dist/hive/hive-1.2.1/)
拷贝{hive-home}/lib中的相关包:
{hive-home}/lib/hive-exec-2.3.4.jar
{hive-home}/lib/hive-common-2.3.4.jar
{hive-home}/lib/hive-metastore-2.3.4.jar
{hive-home}/lib/hive-shims-common-2.3.4.jar
{hive-home}/lib/antlr-runtime-3.5.2.jar
{hive-home}/lib/datanucleus-api-jdo-4.2.4.jar
{hive-home}/lib/datanucleus-core-4.1.17.jar
{hive-home}/lib/datanucleus-rdbms-4.1.19.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.2.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar
报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
at org.apache.flink.table.client.cli.CliOptionsParser.<clinit>(CliOptionsParser.java:43)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:188)
将之前导入的{hive-home}/lib/commons-cli-1.2.jar改为:commons-cli-1.3.1.jar,再次运行成功:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
No default environment specified.
Searching for '/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...done.
▒▓██▓██▒
▓████▒▒█▓▒▓███▓▒
▓███▓░░ ▒▒▒▓██▒ ▒
░██▒ ▒▒▓▓█▓▓▒░ ▒████
██▒ ░▒▓███▒ ▒█▒█▒
░▓█ ███ ▓░▒██
▓█ ▒▒▒▒▒▓██▓░▒░▓▓█
█░ █ ▒▒░ ███▓▓█ ▒█▒▒▒
████░ ▒▓█▓ ██▒▒▒ ▓███▒
░▒█▓▓██ ▓█▒ ▓█▒▓██▓ ░█░
▓░▒▓████▒ ██ ▒█ █▓░▒█▒░▒█▒
███▓░██▓ ▓█ █ █▓ ▒▓█▓▓█▒
░██▓ ░█░ █ █▒ ▒█████▓▒ ██▓░▒
███░ ░ █░ ▓ ░█ █████▒░░ ░█░▓ ▓░
██▓█ ▒▒▓▒ ▓███████▓░ ▒█▒ ▒▓ ▓██▓
▒██▓ ▓█ █▓█ ░▒█████▓▓▒░ ██▒▒ █ ▒ ▓█▒
▓█▓ ▓█ ██▓ ░▓▓▓▓▓▓▓▒ ▒██▓ ░█▒
▓█ █ ▓███▓▒░ ░▓▓▓███▓ ░▒░ ▓█
██▓ ██▒ ░▒▓▓███▓▓▓▓▓██████▓▒ ▓███ █
▓███▒ ███ ░▓▓▒░░ ░▓████▓░ ░▒▓▒ █▓
█▓▒▒▓▓██ ░▒▒░░░▒▒▒▒▓██▓░ █▓
██ ▓░▒█ ▓▓▓▓▒░░ ▒█▓ ▒▓▓██▓ ▓▒ ▒▒▓
▓█▓ ▓▒█ █▓░ ░▒▓▓██▒ ░▓█▒ ▒▒▒░▒▒▓█████▒
██░ ▓█▒█▒ ▒▓▓▒ ▓█ █░ ░░░░ ░█▒
▓█ ▒█▓ ░ █░ ▒█ █▓
█▓ ██ █░ ▓▓ ▒█▓▓▓▒█░
█▓ ░▓██░ ▓▒ ▓█▓▒░░░▒▓█░ ▒█
██ ▓█▓░ ▒ ░▒█▒██▒ ▓▓
▓█▒ ▒█▓▒░ ▒▒ █▒█▓▒▒░░▒██
░██▒ ▒▓▓▒ ▓██▓▒█▒ ░▓▓▓▓▒█▓
░▓██▒ ▓░ ▒█▓█ ░░▒▒▒
▒▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░▓▓ ▓░▒█░
______ _ _ _ _____ ____ _ _____ _ _ _ BETA
| ____| (_) | | / ____|/ __ \| | / ____| (_) | |
| |__ | |_ _ __ | | __ | (___ | | | | | | | | |_ ___ _ __ | |_
| __| | | | '_ \| |/ / \___ \| | | | | | | | | |/ _ \ '_ \| __|
| | | | | | | | < ____) | |__| | |____ | |____| | | __/ | | | |_
|_| |_|_|_| |_|_|\_\ |_____/ \___\_\______| \_____|_|_|\___|_| |_|\__|
Welcome! Enter 'HELP;' to list all available commands. 'QUIT;' to exit.
Flink SQL>
配置过程中参考了这篇博客:https://blog.csdn.net/h335146502/article/details/100689010
博主踩坑后的分享非常珍贵,节省了很多时间,具体解决的话需要根据报错类去找相应的hive包。
本来以为到这里大功告成了,然而在测试过程中还是会报错误,基本上都是由于版本的差异导致的,这里仿照官网在Hive中创建一个mytable表:
CREATE TABLE mytable(name string, value double);
测试效果如下:
Flink SQL> show catalogs;
default_catalog
myhive
Flink SQL> use catalog myhive;
Flink SQL> show databases;
2020-01-08 15:14:48,019 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.