在《Spark读取Hbase中的数据》文章中我介绍了如何在
Spark中内置提供了两个方法可以将数据写入到Hbase:(1)、saveAsHadoopDataset;(2)、saveAsNewAPIHadoopDataset,它们的官方介绍分别如下:
saveAsHadoopDataset: Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for that storage system. The JobConf should set an OutputFormat and any output paths required (e.g. a table name to write to) in the same way as it would be configured for a Hadoop MapReduce job.
saveAsNewAPIHadoopDataset: Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop Configuration object for that storage system. The Conf should set an OutputFormat and any output paths required (e.g. a table name to write to) in the same way as it would be configured for a Hadoop MapReduce job.
可以看出这两个API分别是针对mapred和mapreduce实现的,本文将提供这两个版本的实现实例代码。在编写代码之前我们先在pom.xml文件中引入一下的依赖:
org.apache.spark
spark-core_2.10
0.9.1
org.apache.hbase
hbase
0.98.2-hadoop2
org.apache.hbase
hbase-client
0.98.2-hadoop2
org.apache.hbase
hbase-common
0.98.2-hadoop2
org.apache.hbase
hbase-server
0.98.2-hadoop2
saveAsHadoopDataset package com.iteblog.bigdata.hbase
import org.apache.hadoop.hbase.{HConstants, H