在调试MR任务或者操作Hbase表时,往往我们需要将本地代码打成Jar包,然后上传到Hadoop集群上去跑,这样不仅麻烦,还不方便调试,Hadoop开发团队提供了在本地调试代码的API,就是MiniHbaseCluster, 在本机JVM中模拟一个Hadoop集群,与真实环境的Hadoop集群并没有区别,方便我们提交任务和Debug。
笔者要把Hive表中的数据生成HFile文件,然后BulkLoad到Hbase中,在网上搜集相关资料组织的代码总是各种报错,遇到并填补了太多坑,然后终于搞定,废话不提,直接给代码。
pom文件:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop-version>2.6.0</hadoop-version>
<hive-version>1.0.1</hive-version>
<hbase-version>1.1.3</hbase-version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>${hive-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive.hcatalog</groupId>
<artifactId>hive-hcatalog-core</artifactId>
<version>${hive-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0<