cloudera spark2.2 读写hbase
例子
host = 'bigdata-03,bigdata-05,bigdata-04'
conf = {
"hbase.zookeeper.quorum": host,
"hbase.mapreduce.inputtable": "student1"
}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
rdd = sc.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)
result = rdd.collect()
for (k, v) in result:
print k, v
但是上面的代码可能会报错
- hbase的lib包没有引用
Caused by: java.lang.ClassNotFoundException: org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348)
- spark-examles.jar 包没有引用
.... TableInputFormat 没有找到
解决-准备
- 找到spark-env.sh的目录,/opt/cloudera/parcels/SPARK2/lib/spark2/conf
- 找到hbase的lib目录,/opt/cloudera/parcels/CDH/lib/hbase/lib
- 找到spark1lib的目录,/opt/cloudera/parcels/CDH/lib/spark/lib
解决-完成
编辑spark-env.sh,在最后添加
HBASE_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/lib
SPARK_EXAMPLE_LIB=/opt/cloudera/parcels/CDH/lib/spark/lib
export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:$HBASE_CLASSPATH/*:$SPARK_EXAMPLE_LIB/*