Hive 3.1.2 集成phoenix 5.0.0 HBase 2.0.5
组件版本: hive 3.1.2、 hbase 2.0.5 、hadoop 3.1.3、 phoenix-5.0.0-HBase-2.0-bin
集成参考官网文档:https://phoenix.apache.org/hive_storage_handler.html#Performance%20Tuning
主要问题:
问题1、hbase集成phoenix之后,发现hbase日志存在warn 报错,且定时出现
2022-04-08 09:29:14,120 WARN [HBase-Metrics2-1] util.MBeans: Error creating MBean object name: Hadoop:service=HBase,name=RegionServer,sub=IPC
org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists
问题2、hive集成phoenix之后发现select 查询能够执行, 但是所有表的 count insert等mr任务失败。
问题2现象:
hive集成phoenix之后运行count insert 之类的 MR 任务报错:
Application application_1649379694892_0003 failed 2 times (global limit =4; local limit is =2) due to AM Container for appattempt_1649379694892_0003_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2022-04-08 09:55:27.644]Port 9820 specified in URI hdfs://mycluster:9820/tmp/hadoop-yarn/staging/root/.staging/job_1649379694892_0003/job.splitmetainfo but host 'mycluster' is a logical (HA) namenode and does not use port information.
java.io.IOException: Port 9820 specified in URI hdfs://mycluster:9820/tmp/hadoop-yarn/staging/root/.staging/job_1649379694892_0003/job.splitmetainfo but host 'mycluster' is a logical (HA) namenode and does not use port information.
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:274)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:225)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:135)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:355)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:289)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:172)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:268)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:248)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://hd03:8088/cluster/app/application_1649379694892_0003 Then click on links to logs of each attempt.
. Failing the application.
排查过程:
1.删除hive/aux-lib 下phoenix依赖包的 phoenix-5.0.0-HBase-2.0-hive.jar 包后恢复.
怀疑是版本依赖问题,下载phoenix源码 ,切换到5.1分支,checkout 到Updating KEYS for rajeshbabu@apache.org ,也就是phoenix-5.0.0-hbase 2.0.0版本。
2.在源码pom.xml中可知 hadoop默认为3.0.0版本 ,hive 3.0.0 ,hbase 2.0.0
<!-- Hadoop Versions -->
<hbase.version>2.0.0</hbase.version>
<hadoop.version>3.0.0</hadoop.version>
<!-- Dependency versions -->
<commons-cli.version>1.4</commons-cli.version>
<hive.version>3.0.0</hive.version>
3.根据报错,查看3.0.0版本 NameNodeProxiesClient 类的createFailoverProxyProvider 方法的源码,对比3.1.3版本源码,找到问题了
if (checkPort && ((AbstractNNFailoverProxyProvider)providerNN).useLogicalURI()) {
int port = nameNodeUri.getPort();
if (port > 0 && port != 9820) {
throw new IOException("Port " + port + " specified in URI " + nameNodeUri + " but host '" + nameNodeUri.getHost() + "' is a logical (HA) namenode and does not use port information.");
}
}
问题原因:
由于phoenix-5.0.0-HBase-2.0 默认依赖的集群 版本hadoop为 3.0.0 ,并且这部分源码已经编译至phoenix-hive的包中。
而我们使用的hadoop版本为3.1.3 ,3.1.3的port 默认为8020,配置文件hdfs-site.xml同样指定了8020
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>xx01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>xx02:8020</value>
</property>
思考:配置文件中hdfs-site.xml同样指定了9820端口的话是否会规避这个报错?
解决方式:
为了根本解决这个问题,我们将pom.xml文件中的hadoop版本改为 3.1.3 重新编译…
编译参考 https://blog.csdn.net/tianbaochao/article/details/88741571
这里直接使用idea 跳过test进行编译
如果maven有问题无法排除依赖,手动把其他版本的hadoop依赖删除 保留3.1.3版本调试打包。
hbase 从2.0.0 修改为 2.0.5版本:
1.编译报错1:
根据报错org.apache.hadoop.hbase.util.Base64; 将该类统一替换为了java.util.Base64
原方法:Base64.decode(encoded)
替换后方法:Base64.getDecoder().decode(encoded)
原方法:Base64.encodeBytes(endKey)
替换后方法:Bytes.toString(Base64.getEncoder().encode(endKey))
2.编译报错2:
build提示CellComparatorImp构造方法参数问题,原来的compare方法用的是两参数方法,现在原来的方法设置了final属性,我这里改为三参数方法,但里面的默认值不变。修改后如下:
public IndexMemStore() {
this(new CellComparatorImpl(){
@Override
public int compare(Cell a, Cell b, boolean is) {
return super.compare(a, b, true);
}
});
}
3.编译报错3:
PhoenixRpcScheduler 缺少RpcScheduler的实现方法
PhoenixRpcScheduler 中添加以下方法
@Override
public int getMetaPriorityQueueLength() {
return delegate.getMetaPriorityQueueLength();
}
修改上述源码 会解决问题 1
打包后将phoenix-server 放入hbase lib/下
可以解决hbase使用phoenix报错:
2022-04-08 09:29:08,955 WARN [HBase-Metrics2-1] impl.MetricsSystemImpl: Caught exception in callback postStart
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:320)
at com.sun.proxy.$Proxy8.postStart(Unknown Source)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AbstractMethodError: org.apache.hadoop.hbase.ipc.RpcScheduler.getMetaPriorityQueueLength()I
at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getMetaPriorityQueueLength(MetricsHBaseServerWrapperImpl.java:74)
at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:156)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:98)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:72)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
... 15 more
2022-04-08 09:29:14,120 WARN [HBase-Metrics2-1] util.MBeans: Error creating MBean object name: Hadoop:service=HBase,name=RegionServer,sub=IPC
org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:135)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:110)
at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:163)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:95)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:72)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:320)
at com.sun.proxy.$Proxy8.postStart(Unknown Source)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=HBase,name=RegionServer,sub=IPC already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:131)
... 22 more
hive 从3.0.0 修改为 3.1.2版本:
1.编译报错4:
原因为3.0.0版本hive与 3.1.2版本使用的Timestamp和Date类不一致
修改源码:
1.引用类修改
import java.sql.Timestamp;
==>改为
import org.apache.hadoop.hive.common.type.Timestamp;
import org.apache.hadoop.hive.serde2.io.TimestampWritable;
==>改为
import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
import java.sql.Date;
==》改为
import org.apache.hadoop.hive.common.type.Date;
import org.apache.hadoop.hive.serde2.io.DateWritable;
==》
import org.apache.hadoop.hive.serde2.io.DateWritableV2;
2.修改类PhoenixDateObjectInspector:
添加sqlDateToHiveDate() 方法 将 java.sql.Date转为org.apache.hadoop.hive.common.type.Date类型
public class PhoenixDateObjectInspector extends AbstractPhoenixObjectInspector<DateWritableV2>
implements DateObjectInspector {
public PhoenixDateObjectInspector() {
super(TypeInfoFactory.dateTypeInfo);
}
@Override
public Object copyObject(Object o) {
return o == null ? null : sqlDateToHiveDate(o);
}
@Override
public DateWritableV2 getPrimitiveWritableObject(Object o) {
DateWritableV2 value = null;
if (o != null) {
try {
value = new DateWritableV2(sqlDateToHiveDate(o));
} catch (Exception e) {
logExceptionMessage(o, "DATE");
value = new DateWritableV2();
}
}
return value;
}
@Override
public Date getPrimitiveJavaObject(Object o) {
return sqlDateToHiveDate(o);
}
public Date sqlDateToHiveDate(Object o) {
if (o == null) {
return null;
}
if (o instanceof java.sql.Date) {
java.sql.Date var = (java.sql.Date) o;
return Date.valueOf(var.toString());
} else if (o instanceof java.lang.String) {
return Date.valueOf((String) o);
}
return (Date) o;
}
}
3.修改类PhoenixTimestampObjectInspector:
添加sqlTimeToHiveTimestamp()方法,将java.sql.timestamp 类型和string 类型 转为 org.apache.hadoop.hive.common.type.Timestamp 类型。
public class PhoenixTimestampObjectInspector extends
AbstractPhoenixObjectInspector<TimestampWritableV2>
implements TimestampObjectInspector {
public PhoenixTimestampObjectInspector() {
super(TypeInfoFactory.timestampTypeInfo);
}
@Override
public Timestamp getPrimitiveJavaObject(Object o) {
return sqlTimeToHiveTimestamp(o);
}
@Override
public Object copyObject(Object o) {
return o == null ? null : sqlTimeToHiveTimestamp(o);
}
@Override
public TimestampWritableV2 getPrimitiveWritableObject(Object o) {
TimestampWritableV2 value = null;
if (o != null) {
try {
value = new TimestampWritableV2(sqlTimeToHiveTimestamp(o));
} catch (Exception e) {
logExceptionMessage(o, "TIMESTAMP");
}
}
return value;
}
public Timestamp sqlTimeToHiveTimestamp(Object o) {
if (o == null) {
return null;
}
if (o instanceof java.sql.Timestamp) {
java.sql.Timestamp var = (java.sql.Timestamp) o;
return Timestamp.valueOf(var.toString());
} else if (o instanceof java.lang.String) {
return Timestamp.valueOf((String) o);
}
return (Timestamp) o;
}
}
如果不进行修改,使用hive 3.0.0版本能够编译打包通过,但在运行时hive中查询phoenix中的 data timestamp类型字段会导致报错:
Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to org.apache.hadoop.hive.common.type.Timestamp (state=,code=0)
Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.sql.Date cannot be cast to org.apache.hadoop.hive.common.type.Date (state=,code=0)
总结:
上述源码修改解决了:
1.hive和phoenix 的 date和timestamp 类型转换问题。
2.解决了hbase2.0.5集成phoenix5.0.0 报错问题。
3.扩展了,phoenix中string类型字段可以在hive中转换成 date或者timestamp 前提是数据符合类型要求。
转换主要源码如下所示:
public Timestamp sqlTimeToHiveTimestamp(Object o) {
if (o == null) {
return null;
}
if (o instanceof java.sql.Timestamp) {
java.sql.Timestamp var = (java.sql.Timestamp) o;
return Timestamp.valueOf(var.toString());
} else if (o instanceof java.lang.String) {
return Timestamp.valueOf((String) o);
}
return (Timestamp) o;
}
public Date sqlDateToHiveDate(Object o) {
if (o == null) {
return null;
}
if (o instanceof java.sql.Date) {
java.sql.Date var = (java.sql.Date) o;
return Date.valueOf(var.toString());
} else if (o instanceof java.lang.String) {
return Date.valueOf((String) o);
}
return (Date) o;
}