Spark:Structured Streaming Sink总结


测试组件版本:

spark:2.4.0


目前spark2.4支持以下sink:

ForeachBatchSink目前只有spark2.4以上版本支持

ElasticSearchSink实现:

val esOptions = Map(

      "es.write.operation"      -> "upsert" 

      ,"es.mapping.id"           -> "id")

    name.writeStream.options(esOptions)

      .format("org.elasticsearch.spark.sql")

      .option("checkpointLocation","hdfs://zt01/tmp/kafka")

      .start("test/m_retail").awaitTermination()

ForeachSink实现(以写phoenix为例)

name.writeStream.outputMode("append").foreach(new ForeachWriter[Row] {

      def open(partitionId: Long, version: Long): Boolean = {

        true

      }

      def process(value: org.apache.spark.sql.Row): Unit = {

        object phoenixSchema {

          val column1 = StructField("ID",StringType)

          val column2 = StructField("NAME",StringType)

          val structType = StructType(Array(column1,column2))

        }

        val spark = SparkSession.builder.getOrCreate()

        println(value.toString())

        spark.createDataFrame(spark.sparkContext.parallelize(Seq(value))

          .map(x=>Row(x.apply(0).toString,x.apply(1).toString)),phoenixSchema.structType)

          .write.format("org.apache.phoenix.spark").

          mode("overwrite").

          option("table","test1").

          option("zkUrl","BigData-Dev-1:2181")

          .save()

      }

      def close(errorOrNull: Throwable): Unit = {

          println("close")

      }

    }).option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

KafkaSink实现:

val query = df.writeStream.

      format("kafka")

      .option("kafka.bootstrap.servers", "BigData-Dev-5:9092,BigData-Dev-4:9092,BigData-Dev-3:9092,BigData-Dev-2:9092")

      .option("checkpointLocation","hdfs://zt01/tmp/kafka")

      .trigger(Trigger.ProcessingTime(300))

      .option("topic","my_first_topic").start()

ConSoleSink实现:

name.writeStream.outputMode("append")

      .format("console").option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

ForeachBatchSink实现:

name.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.write.format("org.apache.phoenix.spark").
    mode("overwrite").
    option("table","NT_SALE_ORDER_REPLICATION").
    option("zkUrl","BigData-Dev-1:2181")
    .save()
}.option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

新增待补充!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值