问题表现:读hbase时候报各种依赖缺失,运行到:
val conf = HBaseConfiguration.create()或者
val table = new HTable(conf, tableName)
我试了复制HBaseConfiguration源码改写后能create了,但是new HTable的时候还是会自动用源码重新创建conf,于是放弃改源码;具体过程参考如下问题解决:
问题1: 把依赖一起打进jar的时候,可能有些jar下不下来(不用maven-assembly-plugin就没事),根据报错信息直接打开对应链接https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/2.3.0-cdh5.0.0/hadoop-common-2.3.0-cdh5.0.0.pom 发现和报错提示一样的:
404 - Retrieval of /org/apache/hadoop/hadoop-common/2.3.0-cdh5.0.0/hadoop-common-2.3.0-cdh5.0.0.pom from M2Repository(id=snapshots) is forbidden by repository policy SNAPSHOT.
Retrieval of /org/apache/hadoop/hadoop-common/2.3.0-cdh5.0.0/hadoop-common-2.3.0-cdh5.0.0.pom from M2Repository(id=snapshots) is forbidden by repository policy SNAPSHOT.
然后我配置pom就好了:
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
</repositories>
问题:Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null), this version is 0.92.1
调试过程:
val hconf = HBaseConfiguration2.create()
println(hconf.getResource("hbase-site.xml"))
println(hconf.getResource("hbase-default.xml"))
原因: 然后发现hbase-site.xml是从我们project配置的conf文件来的,而hbase-default.xml是从hbase-common-0.98.xx-cdh5.0.0.jar依赖来的;
将hbase-common-0.96.1.1-cdh5.0.0.jar这个依赖放到服务器spark_Home下的lib目录中即可解决
参考:
http://hbase.apache.org/book.html#hbase_default_configurations
Description
Set to true to skip the 'hbase.defaults.for.version' check. Setting this to true can be useful in contexts other than the other side of a maven generation; i.e. running in an IDE. You’ll want to set this boolean to true to avoid seeing the RuntimeException complaint: "hbase-default.xml file seems to be for and old version of HBase (\${hbase.version}), this version is X.X.X-SNAPSHOT"
接下来会有问题:
Caused by: java.lang.ClassNotFoundException: com.google.common.base.Preconditions
添加包hbase-client-0.96.1.1-cdh5.0.0.jar、guava-11.0.2.jar
接下来会有问题:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/protobuf/generated/MasterProtos$MasterService$BlockingInterface
at java.lang.Class.forName0(Native Method)
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingInterface
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
...
继续添加包hbase-protocol-0.96.1.1-cdh5.0.0.jar
接下来会有问题:
Caused by: java.lang.ClassNotFoundException: org.cloudera.htrace.Trace
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
...
继续添加包htrace-core-2.01.jar,然后就能读hbase了!!!
然后写hbase还有问题:
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat
然后添加包hbase-server-0.96.1.1-cdh5.0.0.jar,就能写hbase了!!!到此完成读写!
附:
注意版本问题;
注意maven2管理依赖的话idea右边的dependency会有好多红色波浪线,在setting里面设置为maven3就好了;
--jars $libPath$strjar 指定依赖包路径可能不生效,放到spark_home/lib下就好了(这个有点恶心人);
怎么找需要的jar呢?我的win7上能直接运行程序,在写个test单例对象import下:
import com.google.common.base.Preconditions
import com.google.common.collect.ListMultimap
import org.apache.hadoop.hbase.protobuf.generated.ClientProtos._
import org.apache.hadoop.hbase.protobuf._
import org.apache.hadoop.hbase.protobuf.generated.MasterProtos.MasterService.BlockingInterface
import org.cloudera.htrace.Trace
Ctrl+左键点击进对应包,再传给服务器;