spark第八篇：与Phoenix整合

最新推荐文章于 2022-09-20 10:15:27 发布

weixin_33774615

最新推荐文章于 2022-09-20 10:15:27 发布

阅读量978

点赞数

文章标签：大数据 java scala

原文链接：http://www.cnblogs.com/koushr/p/9844449.html

版权

spark sql可以与hbase交互，比如说通过jdbc，但是实际使用时，一般是利用phoenix操作hbase。此时，需要在项目中引入phoenix-core-4.10.0-HBase-1.2.jar和phoenix-spark-4.10.0-HBase-1.2.jar。

java代码示例：

    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .appName("heihei")
                .master("local[*]")
                .getOrCreate();
        Dataset<Row> df = spark.read().format("org.apache.phoenix.spark")
                .option("zkUrl", "192.168.56.11:2181")
                .option("table", "test1")
                .load();

        df = df.filter("name not like 'hig%'").filter("password like '%0%'");

        df.write().format("org.apache.phoenix.spark")
                .mode(SaveMode.Overwrite)
                .option("zkUrl", "192.168.56.11:2181")
                .option("table", "test2")
                .save();
    }

上例从hbase test1表中读取数据，把符合 where name not like 'hig%' and password like '%0%' 筛选条件的数据输出到hbase test2表中。以上写法不用担心内存溢出，程序会自动partition。

scala代码示例：

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").appName("phoenix-test").getOrCreate()
    // 第一种读取方法
    var df = spark.read.format("org.apache.phoenix.spark").option("table", "test1").option("zkUrl", "192.168.56.11:2181").load()
    df = df.filter("name not like 'hig%'")
      .filter("password like '%0%'")
    df.show()

    val configuration = new Configuration()
    configuration.set("hbase.zookeeper.quorum", "192.168.56.11:2181")
    // 第二种读取方法
    df = spark.sqlContext.phoenixTableAsDataFrame("test1", Array("ID", "INFO.NAME", "INFO.PASSWORD"), conf = configuration)
    df.show()

    //第一种输出方法
    df.write
      .format("org.apache.phoenix.spark")
      .mode("overwrite")
      .option("table", "test2")
      .option("zkUrl", "192.168.56.11:2181")
      .save()

    //第二种输出方法
    df.saveToPhoenix(Map("table" -> "test2", "zkUrl" -> "192.168.56.11:2181"))
  }

phoenixTableAsDataFrame()是org.apache.phoenix.spark.SparkSqlContextFunctions中的方法，saveToPhoenix()是org.apache.phoenix.spark.DataFrameFunctions中的方法，在phoenix-spark-4.10.0-HBase-1.2.jar中。使用这两个方法时必须 import org.apache.phoenix.spark._，否则编辑器识别不出语法，也不会自动import。

转载于:https://www.cnblogs.com/koushr/p/9844449.html

weixin_33774615

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
spark第八篇：与Phoenix整合

spark sql可以与hbase交互，比如说通过jdbc，但是实际使用时，一般是利用phoenix操作hbase。此时，需要在项目中引入phoenix-core-4.10.0-HBase-1.2.jar和phoenix-spark-4.10.0-HBase-1.2.jar。java代码示例： public static void main(String[] args) { ...
复制链接

扫一扫