测试组件版本:
spark:2.4.0
目前spark2.4支持以下sink:
ForeachBatchSink目前只有spark2.4以上版本支持
ElasticSearchSink实现:
val esOptions = Map(
"es.write.operation" -> "upsert"
,"es.mapping.id" -> "id")
name.writeStream.options(esOptions)
.format("org.elasticsearch.spark.sql")
.option("checkpointLocation","hdfs://zt01/tmp/kafka")
.start("test/m_retail").awaitTermination()
ForeachSink实现(以写phoenix为例)
name.writeStream.outputMode("append").foreach(new ForeachWriter[Row] {
def open(partitionId: Long, version: Long): Boolean = {
true
}
def process(value: org.apache.spark.sql.Row): Unit = {
object phoenixSchema {
val column1 = StructField("ID",StringType)
val column2 = StructField("NAME",StringType)
val structType = StructType(Array(column1,column2))
}
val spark = SparkSession.builder.getOrCreate()
println(value.toString())
spark.createDataFrame(spark.sparkContext.parallelize(Seq(value))
.map(x=>Row(x.apply(0).toString,x.apply(1).toString)),phoenixSchema.structType)
.write.format("org.apache.phoenix.spark").
mode("overwrite").
option("table","test1").
option("zkUrl","BigData-Dev-1:2181")
.save()
}
def close(errorOrNull: Throwable): Unit = {
println("close")
}
}).option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()
KafkaSink实现:
val query = df.writeStream.
format("kafka")
.option("kafka.bootstrap.servers", "BigData-Dev-5:9092,BigData-Dev-4:9092,BigData-Dev-3:9092,BigData-Dev-2:9092")
.option("checkpointLocation","hdfs://zt01/tmp/kafka")
.trigger(Trigger.ProcessingTime(300))
.option("topic","my_first_topic").start()
ConSoleSink实现:
name.writeStream.outputMode("append")
.format("console").option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()
ForeachBatchSink实现:
name.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
batchDF.write.format("org.apache.phoenix.spark").
mode("overwrite").
option("table","NT_SALE_ORDER_REPLICATION").
option("zkUrl","BigData-Dev-1:2181")
.save()
}.option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()
新增待补充!