Spark Structured Streaming——Handling Watermarking and Late Date

## 水位线的作用
简而言之,数据驱动水位线,被水位线没过的窗口会进行计算,计算完丢弃掉窗口
//不支持scala语言我也很无奈呀
//一段案例代码
```
object HandlingWatermarkSss{
  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder().master("local[*]").appName("Handling watermark sss").getOrCreate()
    sparkSession.sparkContext.setLogLevel("ERROR")
    val dsr = sparkSession
      .readStream
      .format("socket")
      .option("host", "Spark")
      .option("port", 4444)
      .load

    import sparkSession.implicits._
    val dataFrame = dsr
      .as[String]
      .map(t => {
        val strings = t.split(",")
        val word = strings(0)
        val timestamp = strings(1)
        (word, new Timestamp(timestamp.toLong))
      })
      .toDF("word", "timestamp")

    import org.apache.spark.sql.functions._
    dataFrame
      .withWatermark("timestamp","10 seconds")//根据当前事件时间确定水位线
      .groupBy(window($"timestamp","10 seconds","5 econds"),$"word")//根据时间戳及单词分区
      .count
      //.printSchema()//打印表结构
      .map(t=>(t.getStruct(0).getTimestamp(0),t.getStruct(0).getTimestamp(1),t.getString(1),t.getLong(2)))
      .withColumnRenamed("_1","start time")
      .withColumnRenamed("_2","end time")
      .withColumnRenamed("_3","word")
      .withColumnRenamed("_4","count")
      .writeStream
      .format("console")
      .outputMode("update")
      .start
      .awaitTermination
  }
}

```

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值