cdh6.1.1的hive升级到CDH6.3.2的hive

1.在hive创建数据库的情况下,impala无法自动刷新元数据

1.1 发现问题
在CDH6.1版本下创建数据库, 如在hive中create database test_db; 再在impala中 show databases;

没有显示test_db,说明test_db并没有刷新到Impala的catalog中,通过查找Impala Catalog的role log,发现如下的异常日志:

Unexpected exception received while processing event
Java exception follows:
org.apache.impala.catalog.events.MetastoreNotificationException: EventId: 591869 EventType: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1108)
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1089)
    at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.get(MetastoreEvents.java:168)
    at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.getFilteredEvents(MetastoreEvents.java:205)
    at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:601)
    at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1106)
    ... 12 more

重要的部分在: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration

但是我们确实是配置了这个配置,确没生效,我们在元数据库metastore的notification_log中查到这个event的message

{"server":"","servicePrincipal":"","db":"test_db","timestamp":1622247221,"location":"hdfs://nameservice1/user/hive/warehouse/test_db.db","ownerType":"USER","ownerName":"admin"}

由于参考资料[2]是在CDH 6.3.2环境下搭建的,我在测试环境搭了一台单节点的CDH6.3.2, 起来后,同样的在新的CDH6.3.2环境中配置自动刷新元数据功能,发现了创建database的message为:

{"server":"","servicePrincipal":"","db":"test_db","dbJson":"{\"1\":{\"str\":\"davie_test\"},\"3\":{\"str\":\"hdfs://nameservice1/user/hive/warehouse/test_db.db\"},\"6\":{\"str\":\"admin\"},\"7\":{\"i32\":1},\"9\":{\"i32\":1622248258}}","timestamp":1622248259,"location":"hdfs://nameservice1/user/hive/warehouse/davie_test.db","ownerType":"USER","ownerName":"admin"};

两者一对比,发现在CDH6.1.0的Hive中,message中缺少了dbJson这个字段

在IDEA中查找代码发现,确实是少了这个字段.

想要解锁impala元数据自动刷新功能,只能再升级一下Hive了。

2. Hive升级到 2.1.1-cdh6.3.2版本

2.1编译打包

具体的步骤是下载Cloudera Hive 代码,然后编译打包

git clone --single-branch --branch cdh6.3.2-release https://github.com/cloudera/hive.git hive
mvn clean package -DskipTests -Pdist

在编译的过程中,发现有些Cloudera的包下载不下来的,需要新添加mirror

<repository>
  <id>nexus-aliyun</id>
  <url>http://maven.aliyun.com/nexus/content/groups/public</url>
  <name>nexus-aliyun</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>
<repository>
  <id>cloudera-repos</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  <name>CDH Releases Repository</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>

Maven下载依赖的过程中,发现Hbase的依赖javax.el一直都下不下来,报以下的错误

Could not find artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT

在pom.xml文件中添加javax.el的dependency并指定版本,重新再运行maven命令即可

<glassfish.el.version>3.0.1-b06</glassfish.el.version>
 
<dependency>
    <groupId>org.glassfish</groupId>
    <artifactId>javax.el</artifactId>
    <version>${glassfish.el.version}</version>
</dependency>

正常编译好后,可以在 hive/packaging/target目录下看到 apache-hive-2.1.1-cdh6.3.2-bin.tar.gz 文件

2.2 元数据备份
然后把这个文件拷贝到CDH集群,解压,对元数据进行升级,升级前进行元数据备份。

mysqldump -uroot  -ptest metastore > ./metastore.sql

2.3 元数据升级
备份完成后登录metastore元数据库,运行以下命令对CDH6.1.0的元数据进行升级

source $HIVE_6.3.2/scripts/metastore/upgrade/mysql/upgrade-2.1.1-cdh6.1.0-to-2.1.1-cdh6.2.0.mysql.sql

我们查看 upgrade-2.1.1-cdh6.1.0-to-2.1.1-cdh6.2.0.mysql.sql 这个脚本,发现只是对 DBS 表新增了一个CREATE_TIME字段,然后再更新了一些CDH_VERSION的SCHEMA_VERSION信息,没有重大的变更。

2.4 更新Hive lib目录

再在 /opt/cloudera/parcels/CDH/lib/hive/目录下新建 lib632目录

mkdir /opt/cloudera/parcels/CDH/lib/hive/lib632

将CDH6.3.2的Hive的lib下的文件拷贝到lib632目录下

cp -r /root/cdh-hive-cdh6.3.2-release/packaging/target/apache-hive-2.1.1-cdh6.3.2-bin/apache-hive-2.1.1-cdh6.3.2-bin/lib/* /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hive/lib632/

再修改hive脚本指定的lib目录

vim /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hive/bin/hive

在94行,将 HIVE_LIB=${HIVE_HOME}/lib 改成 HIVE_LIB=${HIVE_HOME}/lib632

完成之后,在CM上重启Hive相关的服务

2.5 Hive升级验证

对Hive的功能进行验证,包括hive sql执行,hive udf测试, hive相关的组件(hbase impala sqoop)测试等

2.6升级后的版本查看

3.问题解决:

升级后,计算引擎切换到spark的跑任务使用count或者其他的函数的时候报错处理Remote Spark Driver to HiveServer2 Connection] Received error message: io.netty.handler.codec.DecoderException: ​​org.apache.hive.com​​​.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 109, Size: 6
Serialization trace:

主要是因为spark引入的还是cdh6.1.1的hive的exec的包

只需要把这个cdh6.1.1的hive包exec在spark的hive文件夹中更新掉CDH6.3.2的即可 。

  • 12
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值