Flink-1.9.1集成读写Hive(基于CDH 6.1.0集成失败过程记录)

写在前面

本文记录了一次基于CDH-6.1.0实现Flink-1.9.1集成读写Hive的过程,介绍了Flink自带sql-client连接Hive的方式以及java实现连接Hive的小demo,但是因为版本的原因没有执行成功,目前考虑其他方式对接Hive,所以方法仅供参考。

需要提醒的是Flink在1.9.x版本才提供集成读写Hive的功能,且是beta版,Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的对应Hive版本(起码大版本号要对应,否则出现方法找不到等错误)。

配置Flink-1.9.1使用Hive-2.3.4

首先修改flink-1.9.1/conf/sql-client-defaults.yaml配置,为hive配置catalog相关参数,cdh版本的hive-conf目录为:/etc/hive/conf.cloudera.hive。

[root@node01 lib]# vi /opt/flink-1.9.1/conf/sql-client-defaults.yaml
...
#==============================================================================
# Catalogs
#==============================================================================

# Define catalogs here.

#catalogs: [] # empty list
# A typical catalog definition looks like:
#  - name: myhive
#    type: hive
#    hive-conf-dir: /opt/hive_conf/
#    default-database: ...

catalogs:
   - name: myhive
     type: hive
     property-version: 1
     hive-conf-dir: /etc/hive/conf.cloudera.hive
     hive-version: 2.3.4

执行bin/sql-client.sh embedded启动sql-client,第一次报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...

Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
        ... 2 more
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.CatalogFactory' in
the classpath.

Reason: No context matches.

The following properties are requested:
hive-conf-dir=/etc/hive/conf.cloudera.hive
hive-version=2.3.4
property-version=1
type=hive

The following factories have been considered:
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
        at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
        at org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
        at org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
        at org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:114)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:258)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
        at java.util.HashMap.forEach(HashMap.java:1289)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.<init>(ExecutionContext.java:135)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
        ... 4 more

加入jar:

{flink-home}/flink-connectors/flink-connector-hive/target/flink-connector-hive_2.11-1.9.1.jar
{flink-home}/flink-connectors/flink-hadoop-compatibility/target/flink-hadoop-compatibility_2.11-1.9.1.jar

运行再次报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...

Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
        ... 2 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/common/util/HiveVersionInfo
        at org.apache.flink.table.catalog.hive.client.HiveShimLoader.getHiveVersion(HiveShimLoader.java:58)
        at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:82)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
        at java.util.HashMap.forEach(HashMap.java:1289)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.<init>(ExecutionContext.java:135)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
        ... 4 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.common.util.HiveVersionInfo
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 11 more

报错显示缺少Hive相关的jar包,sql-client的jar管理直接放在{flink-home}/lib下,且Hive的版本支持2.3.4和1.2.1,我的Hive版本:Hive 2.1.1-cdh6.1.0,根据版本最近选择2.3.4,下载Hive-2.3.4的安装包:http://archive.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz(Hive-1.2.1:http://archive.apache.org/dist/hive/hive-1.2.1/)

拷贝{hive-home}/lib中的相关包:

{hive-home}/lib/hive-exec-2.3.4.jar
{hive-home}/lib/hive-common-2.3.4.jar
{hive-home}/lib/hive-metastore-2.3.4.jar
{hive-home}/lib/hive-shims-common-2.3.4.jar
{hive-home}/lib/antlr-runtime-3.5.2.jar
{hive-home}/lib/datanucleus-api-jdo-4.2.4.jar
{hive-home}/lib/datanucleus-core-4.1.17.jar
{hive-home}/lib/datanucleus-rdbms-4.1.19.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.2.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar

报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
        at org.apache.flink.table.client.cli.CliOptionsParser.<clinit>(CliOptionsParser.java:43)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:188)

将之前导入的{hive-home}/lib/commons-cli-1.2.jar改为:commons-cli-1.3.1.jar,再次运行成功:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
No default environment specified.
Searching for '/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...done.

                                   ▒▓██▓██▒
                               ▓████▒▒█▓▒▓███▓▒
                            ▓███▓░░        ▒▒▒▓██▒  ▒
                          ░██▒   ▒▒▓▓█▓▓▒░      ▒████
                          ██▒         ░▒▓███▒    ▒█▒█▒
                            ░▓█            ███   ▓░▒██
                              ▓█       ▒▒▒▒▒▓██▓░▒░▓▓█
                            █░ █   ▒▒░       ███▓▓█ ▒█▒▒▒
                            ████░   ▒▓█▓      ██▒▒▒ ▓███▒
                         ░▒█▓▓██       ▓█▒    ▓█▒▓██▓ ░█░
                   ▓░▒▓████▒ ██         ▒█    █▓░▒█▒░▒█▒
                  ███▓░██▓  ▓█           █   █▓ ▒▓█▓▓█▒
                ░██▓  ░█░            █  █▒ ▒█████▓▒ ██▓░▒
               ███░ ░ █░          ▓ ░█ █████▒░░    ░█░▓  ▓░
              ██▓█ ▒▒▓▒          ▓███████▓░       ▒█▒ ▒▓ ▓██▓
           ▒██▓ ▓█ █▓█       ░▒█████▓▓▒░         ██▒▒  █ ▒  ▓█▒
           ▓█▓  ▓█ ██▓ ░▓▓▓▓▓▓▓▒              ▒██▓           ░█▒
           ▓█    █ ▓███▓▒░              ░▓▓▓███▓          ░▒░ ▓█
           ██▓    ██▒    ░▒▓▓███▓▓▓▓▓██████▓▒            ▓███  █
          ▓███▒ ███   ░▓▓▒░░   ░▓████▓░                  ░▒▓▒  █▓
          █▓▒▒▓▓██  ░▒▒░░░▒▒▒▒▓██▓░                            █▓
          ██ ▓░▒█   ▓▓▓▓▒░░  ▒█▓       ▒▓▓██▓    ▓▒          ▒▒▓
          ▓█▓ ▓▒█  █▓░  ░▒▓▓██▒            ░▓█▒   ▒▒▒░▒▒▓█████▒
           ██░ ▓█▒█▒  ▒▓▓▒  ▓█                █░      ░░░░   ░█▒
           ▓█   ▒█▓   ░     █░                ▒█              █▓
            █▓   ██         █░                 ▓▓        ▒█▓▓▓▒█░
             █▓ ░▓██░       ▓▒                  ▓█▓▒░░░▒▓█░    ▒█
              ██   ▓█▓░      ▒                    ░▒█▒██▒      ▓▓
               ▓█▒   ▒█▓▒░                         ▒▒ █▒█▓▒▒░░▒██
                ░██▒    ▒▓▓▒                     ▓██▓▒█▒ ░▓▓▓▓▒█▓
                  ░▓██▒                          ▓░  ▒█▓█  ░░▒▒▒
                      ▒▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░▓▓  ▓░▒█░
          
    ______ _ _       _       _____  ____  _         _____ _ _            _  BETA   
   |  ____| (_)     | |     / ____|/ __ \| |       / ____| (_)          | |  
   | |__  | |_ _ __ | | __ | (___ | |  | | |      | |    | |_  ___ _ __ | |_ 
   |  __| | | | '_ \| |/ /  \___ \| |  | | |      | |    | | |/ _ \ '_ \| __|
   | |    | | | | | |   <   ____) | |__| | |____  | |____| | |  __/ | | | |_ 
   |_|    |_|_|_| |_|_|\_\ |_____/ \___\_\______|  \_____|_|_|\___|_| |_|\__|
          
        Welcome! Enter 'HELP;' to list all available commands. 'QUIT;' to exit.


Flink SQL> 

配置过程中参考了这篇博客:https://blog.csdn.net/h335146502/article/details/100689010

博主踩坑后的分享非常珍贵,节省了很多时间,具体解决的话需要根据报错类去找相应的hive包。

本来以为到这里大功告成了,然而在测试过程中还是会报错误,基本上都是由于版本的差异导致的,这里仿照官网在Hive中创建一个mytable表:

CREATE TABLE mytable(name string, value double);

测试效果如下:

Flink SQL> show catalogs;
default_catalog
myhive

Flink SQL> use catalog myhive;

Flink SQL> show databases;
2020-01-08 15:14:48,019 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值