spark操作phoenix

最新推荐文章于 2022-09-20 10:15:27 发布

何星平

最新推荐文章于 2022-09-20 10:15:27 发布

阅读量663

点赞数

分类专栏： SPARK

本文链接：https://blog.csdn.net/m0_45031497/article/details/103879025

版权

SPARK 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

spark第八篇：与Phoenix整合

spark sql可以与hbase交互，比如说通过jdbc，但是实际使用时，一般是利用phoenix操作hbase。此时，需要在项目中引入phoenix-core-4.10.0-HBase-1.2.jar和phoenix-spark-4.10.0-HBase-1.2.jar。

java代码示例：

    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .appName("heihei")
                .master("local[*]")
                .getOrCreate();
        Dataset<Row> df = spark.read().format("org.apache.phoenix.spark")
                .option("zkUrl", "192.168.56.11:2181")
                .option("table", "test1")
                .load();
df = df.filter("name not like 'hig%'").filter("password like '%0%'");
 df.write().format("org.apache.phoenix.spark")
                .mode(SaveMode.Overwrite)
                .option("zkUrl", "192.168.56.11:2181")
                .option("table", "test2")
                .save();
    }

上例从hbase test1表中读取数据，把符合 where name not like 'hig%' and password like '%0%' 筛选条件的数据输出到hbase test2表中。以上写法不用担心内存溢出，程序会自动partition。

scala代码示例：

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").appName("phoenix-test").getOrCreate()
    // 第一种读取方法
    var df = spark.read.format("org.apache.phoenix.spark").option("table", "test1").option("zkUrl", "192.168.56.11:2181").load()
    df = df.filter("name not like 'hig%'")
      .filter("password like '%0%'")
    df.show()
    val configuration = new Configuration()
    configuration.set("hbase.zookeeper.quorum", "192.168.56.11:2181")
    // 第二种读取方法
    df = spark.sqlContext.phoenixTableAsDataFrame("test1", Array("ID", "INFO.NAME", "INFO.PASSWORD"), conf = configuration)
    df.show()
     //第一种输出方法
    df.write
      .format("org.apache.phoenix.spark")
      .mode("overwrite")
      .option("table", "test2")
      .option("zkUrl", "192.168.56.11:2181")
      .save()
       //第二种输出方法
    df.saveToPhoenix(Map("table" -> "test2", "zkUrl" -> "192.168.56.11:2181"))
  }

phoenixTableAsDataFrame()是org.apache.phoenix.spark.SparkSqlContextFunctions中的方法，saveToPhoenix()是org.apache.phoenix.spark.DataFrameFunctions中的方法，在phoenix-spark-4.10.0-HBase-1.2.jar中。使用这两个方法时必须 import org.apache.phoenix.spark._，否则编辑器识别不出语法，也不会自动import。