Hive 3.1.2 集成phoenix 5.0.0 HBase 2.0.5

Hive 3.1.2 集成phoenix 5.0.0 HBase 2.0.5

组件版本: hive 3.1.2、 hbase 2.0.5 、hadoop 3.1.3、 phoenix-5.0.0-HBase-2.0-bin

集成参考官网文档:https://phoenix.apache.org/hive_storage_handler.html#Performance%20Tuning

主要问题:

问题1、hbase集成phoenix之后,发现hbase日志存在warn 报错,且定时出现

2022-04-08 09:29:14,120 WARN  [HBase-Metrics2-1] util.MBeans: Error creating MBean object name: Hadoop:service=HBase,name=RegionServer,sub=IPC
org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists

问题2、hive集成phoenix之后发现select 查询能够执行, 但是所有表的 count insert等mr任务失败。

问题2现象:

hive集成phoenix之后运行count insert 之类的 MR 任务报错:

Application application_1649379694892_0003 failed 2 times (global limit =4; local limit is =2) due to AM Container for appattempt_1649379694892_0003_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2022-04-08 09:55:27.644]Port 9820 specified in URI hdfs://mycluster:9820/tmp/hadoop-yarn/staging/root/.staging/job_1649379694892_0003/job.splitmetainfo but host 'mycluster' is a logical (HA) namenode and does not use port information.
java.io.IOException: Port 9820 specified in URI hdfs://mycluster:9820/tmp/hadoop-yarn/staging/root/.staging/job_1649379694892_0003/job.splitmetainfo but host 'mycluster' is a logical (HA) namenode and does not use port information.
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:274)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:225)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:135)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:355)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:289)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:172)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:268)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:248)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://hd03:8088/cluster/app/application_1649379694892_0003 Then click on links to logs of each attempt.
. Failing the application.

排查过程:

1.删除hive/aux-lib 下phoenix依赖包的 phoenix-5.0.0-HBase-2.0-hive.jar 包后恢复.
怀疑是版本依赖问题,下载phoenix源码 ,切换到5.1分支,checkout 到Updating KEYS for rajeshbabu@apache.org ,也就是phoenix-5.0.0-hbase 2.0.0版本。

2.在源码pom.xml中可知 hadoop默认为3.0.0版本 ,hive 3.0.0 ,hbase 2.0.0

<!-- Hadoop Versions -->
<hbase.version>2.0.0</hbase.version>
<hadoop.version>3.0.0</hadoop.version>

<!-- Dependency versions -->
<commons-cli.version>1.4</commons-cli.version>
<hive.version>3.0.0</hive.version>

3.根据报错,查看3.0.0版本 NameNodeProxiesClient 类的createFailoverProxyProvider 方法的源码,对比3.1.3版本源码,找到问题了

if (checkPort && ((AbstractNNFailoverProxyProvider)providerNN).useLogicalURI()) {
    int port = nameNodeUri.getPort();
    if (port > 0 && port != 9820) {
        throw new IOException("Port " + port + " specified in URI " + nameNodeUri + " but host '" + nameNodeUri.getHost() + "' is a logical (HA) namenode and does not use port information.");
    }
}

问题原因:

由于phoenix-5.0.0-HBase-2.0 默认依赖的集群 版本hadoop为 3.0.0 ,并且这部分源码已经编译至phoenix-hive的包中。

而我们使用的hadoop版本为3.1.3 ,3.1.3的port 默认为8020,配置文件hdfs-site.xml同样指定了8020

<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>xx01:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>xx02:8020</value>
  </property>

思考:配置文件中hdfs-site.xml同样指定了9820端口的话是否会规避这个报错?

解决方式:

为了根本解决这个问题,我们将pom.xml文件中的hadoop版本改为 3.1.3 重新编译…

编译参考 https://blog.csdn.net/tianbaochao/article/details/88741571

这里直接使用idea 跳过test进行编译

如果maven有问题无法排除依赖,手动把其他版本的hadoop依赖删除 保留3.1.3版本调试打包。

hbase 从2.0.0 修改为 2.0.5版本:

1.编译报错1:

根据报错org.apache.hadoop.hbase.util.Base64; 将该类统一替换为了java.util.Base64
在这里插入图片描述

原方法:Base64.decode(encoded)
替换后方法:Base64.getDecoder().decode(encoded)

原方法:Base64.encodeBytes(endKey)
替换后方法:Bytes.toString(Base64.getEncoder().encode(endKey))

2.编译报错2:

build提示CellComparatorImp构造方法参数问题,原来的compare方法用的是两参数方法,现在原来的方法设置了final属性,我这里改为三参数方法,但里面的默认值不变。修改后如下:

public IndexMemStore() {
  this(new CellComparatorImpl(){
      @Override
      public int compare(Cell a, Cell b, boolean is) {
          return super.compare(a, b, true);
      }
  });
}

3.编译报错3:

PhoenixRpcScheduler 缺少RpcScheduler的实现方法
PhoenixRpcScheduler 中添加以下方法

@Override
public int getMetaPriorityQueueLength() {
    return  delegate.getMetaPriorityQueueLength();
}

修改上述源码 会解决问题 1
打包后将phoenix-server 放入hbase lib/下
可以解决hbase使用phoenix报错:

2022-04-08 09:29:08,955 WARN  [HBase-Metrics2-1] impl.MetricsSystemImpl: Caught exception in callback postStart
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:320)
        at com.sun.proxy.$Proxy8.postStart(Unknown Source)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
        at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AbstractMethodError: org.apache.hadoop.hbase.ipc.RpcScheduler.getMetaPriorityQueueLength()I
        at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getMetaPriorityQueueLength(MetricsHBaseServerWrapperImpl.java:74)
        at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:156)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:98)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:72)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
        ... 15 more
        
        
        2022-04-08 09:29:14,120 WARN  [HBase-Metrics2-1] util.MBeans: Error creating MBean object name: Hadoop:service=HBase,name=RegionServer,sub=IPC
org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists!
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:135)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:110)
        at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:163)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:95)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:72)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:320)
        at com.sun.proxy.$Proxy8.postStart(Unknown Source)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
        at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists!
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:131)
        ... 22 more

hive 从3.0.0 修改为 3.1.2版本:

1.编译报错4:

原因为3.0.0版本hive与 3.1.2版本使用的Timestamp和Date类不一致

修改源码:
1.引用类修改
import java.sql.Timestamp;
==>改为
import org.apache.hadoop.hive.common.type.Timestamp;

import org.apache.hadoop.hive.serde2.io.TimestampWritable;
==>改为
import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;

import java.sql.Date;
==》改为
import org.apache.hadoop.hive.common.type.Date;


import org.apache.hadoop.hive.serde2.io.DateWritable;
==》
import org.apache.hadoop.hive.serde2.io.DateWritableV2;
2.修改类PhoenixDateObjectInspector:

添加sqlDateToHiveDate() 方法 将 java.sql.Date转为org.apache.hadoop.hive.common.type.Date类型

public class PhoenixDateObjectInspector extends AbstractPhoenixObjectInspector<DateWritableV2>
        implements DateObjectInspector {

    public PhoenixDateObjectInspector() {
        super(TypeInfoFactory.dateTypeInfo);
    }

    @Override
    public Object copyObject(Object o) {
        return o == null ? null : sqlDateToHiveDate(o);
    }

    @Override
    public DateWritableV2 getPrimitiveWritableObject(Object o) {
        DateWritableV2 value = null;

        if (o != null) {
            try {
                value = new DateWritableV2(sqlDateToHiveDate(o));
            } catch (Exception e) {
                logExceptionMessage(o, "DATE");
                value = new DateWritableV2();
            }
        }

        return value;
    }

    @Override
    public Date getPrimitiveJavaObject(Object o) {
        return sqlDateToHiveDate(o);
    }

    public Date sqlDateToHiveDate(Object o) {
        if (o == null) {
            return null;
        }
        if (o instanceof java.sql.Date) {
            java.sql.Date var = (java.sql.Date) o;
            return Date.valueOf(var.toString());
        } else if (o instanceof java.lang.String) {
            return Date.valueOf((String) o);
        }
        return (Date) o;
    }
}
3.修改类PhoenixTimestampObjectInspector:

添加sqlTimeToHiveTimestamp()方法,将java.sql.timestamp 类型和string 类型 转为 org.apache.hadoop.hive.common.type.Timestamp 类型。

public class PhoenixTimestampObjectInspector extends
        AbstractPhoenixObjectInspector<TimestampWritableV2>
        implements TimestampObjectInspector {

    public PhoenixTimestampObjectInspector() {
        super(TypeInfoFactory.timestampTypeInfo);
    }

    @Override
    public Timestamp getPrimitiveJavaObject(Object o) {
        return sqlTimeToHiveTimestamp(o);
    }

    @Override
    public Object copyObject(Object o) {
        return o == null ? null : sqlTimeToHiveTimestamp(o);
    }

    @Override
    public TimestampWritableV2 getPrimitiveWritableObject(Object o) {
        TimestampWritableV2 value = null;

        if (o != null) {
            try {
                value = new TimestampWritableV2(sqlTimeToHiveTimestamp(o));
            } catch (Exception e) {
                logExceptionMessage(o, "TIMESTAMP");
            }
        }

        return value;
    }

    public Timestamp sqlTimeToHiveTimestamp(Object o) {
        if (o == null) {
            return null;
        }
        if (o instanceof java.sql.Timestamp) {
            java.sql.Timestamp var = (java.sql.Timestamp) o;
            return Timestamp.valueOf(var.toString());
        } else if (o instanceof java.lang.String) {
            return Timestamp.valueOf((String) o);
        }
        return (Timestamp) o;
    }
}

如果不进行修改,使用hive 3.0.0版本能够编译打包通过,但在运行时hive中查询phoenix中的 data timestamp类型字段会导致报错:

Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to org.apache.hadoop.hive.common.type.Timestamp (state=,code=0)

Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.sql.Date cannot be cast to org.apache.hadoop.hive.common.type.Date (state=,code=0)

总结:

上述源码修改解决了:

1.hive和phoenix 的 date和timestamp 类型转换问题。
2.解决了hbase2.0.5集成phoenix5.0.0 报错问题。
3.扩展了,phoenix中string类型字段可以在hive中转换成 date或者timestamp 前提是数据符合类型要求。
转换主要源码如下所示:

    public Timestamp sqlTimeToHiveTimestamp(Object o) {
        if (o == null) {
            return null;
        }
        if (o instanceof java.sql.Timestamp) {
            java.sql.Timestamp var = (java.sql.Timestamp) o;
            return Timestamp.valueOf(var.toString());
        } else if (o instanceof java.lang.String) {
            return Timestamp.valueOf((String) o);
        }
        return (Timestamp) o;
    }
    
    
    
    public Date sqlDateToHiveDate(Object o) {
        if (o == null) {
            return null;
        }
        if (o instanceof java.sql.Date) {
            java.sql.Date var = (java.sql.Date) o;
            return Date.valueOf(var.toString());
        } else if (o instanceof java.lang.String) {
            return Date.valueOf((String) o);
        }
        return (Date) o;
    }
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值