介绍PySpark访问Hbase的两种方法,一种是通过newAPIHadoopRDD,读取Hbase为RDD,并转成DataFrame,另一种是在Hive里建立Hbase的外部表,然后通过Spark Sql读取
一、通过newAPIHadoopRDD读取
#spark连接hbase,读取RDD数据
spark = SparkSession.builder.master("yarn-client").appName("hbase_test").getOrCreate()
hbaseconf = { "hbase.zookeeper.quorum":'10.18.105.15',"hbase.mapreduce.inputtable":"table_name",
"hbase.mapreduce.scan.row.start":"***", "hbase.mapreduce.scan.row.stop":"***"}
keyConv = "org.apache.spark.examples.pythonconverter