一、概述程序运行环境很重要,本次测试基于:hadoop-2.6.5spark-1.6.2hbase-1.2.4zookeeper-3.4.6jdk-1.8废话不多说了,直接上需求
Andy column=baseINFO:age,value=21
Andy column=baseINFO:gender,value=0
Andy column=baseINFO:telphone_number,value=110110110
Tom column=baseINFO:age,value=18
Tom column=baseINFO:gender,value=1
Tom column=baseINFO:telphone_number,value=120120120
如上表所示,将之用spark进行分组,达到这样的效果:
[Andy,(21,0,110110110)]
[Tom,(18,1,120120120)]
需求比较简单,主要是熟悉一下程序运行过程
二、具体代码
package com.union.bigdata.spark.hbase;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.mapreduce.TableSplit;import org.apache.hadoop.hbase.util.Base64;import org.apache.hadoop.hbase.util.Bytes;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;import org.apache.spark.api.java.function.Function2;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.io.ImmutableBytes