问题说明:
尝试做hive到greenplum的数据迁移,故想到用gphdfs直接访问hdfs下的数据,但是出现报错,查找良久没有找到解决方法,所以来请教各位大神。麻烦各位指点。
环境:
greeplum :5.1.10
hadoop:hdp2.4
java:1.8.0_91
外部表建表脚本:
CREATE EXTERNAL TABLE gplink.dim_certype_test (
certype VARCHAR ( 32 ),
certype_name VARCHAR ( 32 ),
chosign VARCHAR ( 32 ))
LOCATION ( 'gphdfs://GXGSBigDataHA/apps/hive/warehouse/dma.db/dim_certype/000000_0' ) format 'TEXT';
查询报错:
postgres=# select * from gplink.dim_certype_test;
ERROR: external table gphdfs protocol command ended with error. Error: A JNI error has occurred, please check your installation and try again (seg0 slice1 20.20.20.14:40000 pid=49535)
DETAIL:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/TaskAttemptContext
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMain
Command: execute:source $GPHOME/lib//hadoop/hadoop_env.sh;java $GP_JAVA_OPT -classpath $CLASSPATH com.emc.greenplum.gpdb.hdfsconnector.HDFSReader $GP_SEGMENT_ID $GP_SEGMENT_COUNT TEXT hdp-gnet-1.2.0.0 'gphdfs://GXGSB_0' '000000104300044000000104300044000000104300044' 'certype,certype_name,chosign,'
External table dim_certype_test, file gphdfs://GXGSBigDataHA/apps/hive/warehouse/dma.db/dim_certype/000000_0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
************************************************************************************************************************************************
解决方法:
查找官方文档:
If needed, ensure that the CLASSPATH environment variable generated by the $GPHOME/lib/hadoop/hadoop_env.sh file on every Greenplum Database host contains the path to JAR files that contain Java classes that are required for gphdfs.
For example, if gphdfs returns a class not found exception, ensure the JAR file containing the class is on every Greenplum Database host and update the$GPHOME/lib/hadoop/hadoop_env.sh file so that the CLASSPATH environment variable created by file contains the JAR file.
原因是hadoop_env.sh 里面没有包含java的jar包包含在内,这里我处理的方法是修改hadoop_env.sh,将jar添加进去。添加以下语句到hadoop_env.sh里面。
if [ -d "$JAVA_HOME/lib" ]; then
for f in $JAVA_HOME/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
fi
if [ -d "$JAVA_HOME/jre/lib" ]; then
for f in $JAVA_HOME/jre/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
fi