1 配置IDE环境
(1)安装IDEA http://www.jetbrains.com/idea/download/
(2)安装Scala插件
2 创建HBaseTest工程
创建HBaseTest工程,并添加依赖库:
3 代码
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.CellUtil
import scala.collection.JavaConversions._
import org.apache.spark._
object HBaseTest {
case class MyRow(row: String, value: String) {
override def toString(): String = {
row + ": " + value
}
}
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("HBaseTest").
setMaster("spark://CentOS-01:7077")
val libPath = "E:\\opensrc\\hbase-0.98.6-hadoop2\\lib\\"
val jars = Array(
libPath + "guava-12.0.1.jar",
libPath + "hbase-client-0.98.6-hadoop2.jar",
libPath + "hbase-common-0.98.6-hadoop2.jar",
libPath + "hbase-protocol-0.98.6-hadoop2.jar",
libPath + "hbase-server-0.98.6-hadoop2.jar",
libPath + "htrace-core-2.04.jar",
"E:\\scala_workspace\\spark\\HBaseTest\\out\\artifacts\\hbasetest\\hbasetest.jar"
)
sparkConf.setJars(jars)
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.zookeeper.quorum", "CentOS-01")
conf.set("hbase.master", "CentOS-01:60000")
conf.set(TableInputFormat.INPUT_TABLE, "test")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])
hBaseRDD.cache()
println(hBaseRDD.count())
//hBaseRDD.take(1)
//val result = hBaseRDD.map(kv => Bytes.toString(kv._2.getRow))
//println("result = " + result.count)
//result.collect.foreach(println)
val result = hBaseRDD.map(_._2.listCells())
val cells = result.flatMap(l => {
l.map(cell => {
MyRow(Bytes.toString(CellUtil.cloneRow(cell)),
Bytes.toString(CellUtil.cloneValue(cell)))
})
})
cells.cache()
println("cells size = " + cells.count)
cells.collect().foreach(println)
sc.stop()
}
}
4 打包配置
(1)File ->Project Structure...
(2)在弹出的对话框中选择Artifacts
选择“+”,选择jar,然后选择“Empty”或“From modules with dependencies...”
(3)关闭窗口。选择Build->Build Artifacts...
在弹出窗口中选择Build或Rebuild
5 测试数据
6 运行结果
7 注意问题
由于ImmutableBytesWritable和Result类都是不可序列化的,所以不能在包含这两个类的RDD上做Action操作,否则会出项类不可序列化异常。需要将这两个类对象转换成可序列化对象,然后进行处理,如代码中将Cell转换成MyRow对象的处理等等。
注:
(1)HBase中的Result的listCells方法返回的是最新版本的Cell。
(2)由于listCells返回的是java.util.List,需要使用scala的集合操作,因此需要引入:
import scala.collection.JavaConversions._