Spark SQL functions.scala 源码解析(三)Window functions (基于 Spark 3.3.0)

前言

本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!

本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系

目录

Spark SQL functions.scala 源码解析(一)Sort functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(二)Aggregate functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(三)Window functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(四)Non-aggregate functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(五)Math Functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(六)Misc functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(七)String functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(八)DateTime functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(九)Collection functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十)Partition transform functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十一)Scala UDF functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十二)Java UDF functions(基于 Spark 3.3.0)

Spark SQL 内置函数

Spark SQL 内置函数(六)Window Functions(基于 Spark 3.2.0)

正文

cume_dist

  /**
   * 窗口函数:返回窗口分区内值的累积分布,即低于当前行的行的分数
   *
   * {{{
   *   N = 分区内的总行数
   *   cumeDist(x) = 在当前行(包括当前行)的值的数目 x / N
   * }}}
   *
   * @group window_funcs
   * @since 1.6.0
   */
  def cume_dist(): Column = withExpr { new CumeDist }

用法

========== df.select(cume_dist().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|cume_dist() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                 1.0|
|                                                                                                                 1.0|
|                                                                                                                 1.0|
|                                                                                                                 1.0|
+--------------------------------------------------------------------------------------------------------------------+

dense_rank

  /**
   * 窗口函数:返回窗口分区内行的排名,没有任何间隙。
   * rank 和dense_rank 的区别在于denseRank 在有平局时不会在排序顺序上留下间隙。 
   * 也就是说,如果您使用dense_rank 对比赛进行排名,并且有三个人并列第二,您会说三个人都排在第二位,下一
   * 个人排在第三位。 
   * Rank 会给我连续的数字,使排在第三位的人(在并列之后)登记为第五位。
   * 这相当于 SQL 中的 DENSE_RANK 函数
   *
   * @group window_funcs
   * @since 1.6.0
   */
  def dense_rank(): Column = withExpr { new DenseRank }

用法


========== df.select(dense_rank().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|DENSE_RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   1|
|                                                                                                                   1|
|                                                                                                                   1|
|                                                                                                                   1|
+--------------------------------------------------------------------------------------------------------------------+

lag/lead


  /**
   * 窗口函数:返回当前行之前偏移行的值,如果当前行之前的偏移行少于偏移行,则返回 null 值。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的前一行。
   * 这相当于 SQL 中的 LAG 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lag(e: Column, offset: Int): Column = lag(e, offset, null)

  /**
   * 窗口函数:返回当前行之前偏移行的值,如果当前行之前的偏移行少于偏移行,则返回 null 值。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的前一行。
   * 这相当于 SQL 中的 LAG 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lag(columnName: String, offset: Int): Column = lag(columnName, offset, null)

  /**
   * 窗口函数:返回当前行之前偏移行的值,如果当前行之前有少于偏移行,则返回defaultValue 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的前一行。
   * 这相当于 SQL 中的 LAG 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lag(columnName: String, offset: Int, defaultValue: Any): Column = {
    lag(Column(columnName), offset, defaultValue)
  }

  /**
   * 窗口函数:返回当前行之前偏移行的值,如果当前行之前有少于偏移行,则返回defaultValue 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的前一行。
   * 这相当于 SQL 中的 LAG 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lag(e: Column, offset: Int, defaultValue: Any): Column = {
    lag(e, offset, defaultValue, false)
  }

  /**
   * 窗口函数:返回当前行之前偏移行的值,如果当前行之前有少于偏移行,则返回defaultValue 。 
   * ignoreNulls确定行的空值是包含在计算中还是从计算中消除。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的前一行。
   * 这相当于 SQL 中的 LAG 函数。
   *
   * @group window_funcs
   * @since 3.2.0
   */
  def lag(e: Column, offset: Int, defaultValue: Any, ignoreNulls: Boolean): Column = withExpr {
    Lag(e.expr, Literal(offset), Literal(defaultValue), ignoreNulls)
  }

  /**
   * 窗口函数:返回当前行之后偏移行的值,如果当前行之后少于偏移行,则返回null 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的下一行。
   * 这相当于 SQL 中的 LEAD 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lead(columnName: String, offset: Int): Column = { lead(columnName, offset, null) }

  /**
   * 窗口函数:返回当前行之后偏移行的值,如果当前行之后少于偏移行,则返回null 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的下一行。
   * 这相当于 SQL 中的 LEAD 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lead(e: Column, offset: Int): Column = { lead(e, offset, null) }

  /**
   * 窗口函数:返回当前行后偏移行的值,如果当前行后少于偏移行,则返回defaultValue 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的下一行。
   * 这相当于 SQL 中的 LEAD 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lead(columnName: String, offset: Int, defaultValue: Any): Column = {
    lead(Column(columnName), offset, defaultValue)
  }

  /**
   * 窗口函数:返回当前行后偏移行的值,如果当前行后少于偏移行,则返回defaultValue 。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的下一行。
   * 这相当于 SQL 中的 LEAD 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def lead(e: Column, offset: Int, defaultValue: Any): Column = {
    lead(e, offset, defaultValue, false)
  }

  /**
   * 窗口函数:返回当前行后偏移行的值,如果当前行后少于偏移行,则返回defaultValue 。 
   * ignoreNulls确定行的空值是包含在计算中还是从计算中消除。 
   * ignoreNulls的默认值为 false。 
   * 例如,偏移量为 1 将返回窗口分区中任何给定点的下一行。
   * 这相当于 SQL 中的 LEAD 函数。
   *
   * @group window_funcs
   * @since 3.2.0
   */
  def lead(e: Column, offset: Int, defaultValue: Any, ignoreNulls: Boolean): Column = withExpr {
    Lead(e.expr, Literal(offset), Literal(defaultValue), ignoreNulls)
  }

用法

========== df.select(lag($"key", 2).over(w)) ==========
+-------------------------------------------------------------------------------------------------------------------+
|lag(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
+-------------------------------------------------------------------------------------------------------------------+

========== df.select(lag("key", 2).over(w)) ==========
+-------------------------------------------------------------------------------------------------------------------+
|lag(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
+-------------------------------------------------------------------------------------------------------------------+

========== df.select(lag($"key", 2, "0").over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lag("key", 2, "0").over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lag($"key", 2, "0", true).over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2).over(w)) ==========
+------------------------------------------------------------------------------------------------------------------+
|lead(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
+------------------------------------------------------------------------------------------------------------------+

========== df.select(lead("key", 2).over(w)) ==========
+------------------------------------------------------------------------------------------------------------------+
|lead(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
+------------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2, "0").over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

========== df.select(lead("key", 2, "0").over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2, "0", true).over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

nth_value

  /**
   * 窗口函数:返回值是窗口的偏移第 offset 行(从1开始计数),如果窗口的大小小于偏移行,则返回null 。
   * 当 ignoreNulls 设置为 true 时,它​​将返回它看到的偏移量非空值。 
   * 如果所有值都为 null,则返回 null 值。
   * 这相当于 SQL 中的 nth_value 函数。
   *
   * @group window_funcs
   * @since 3.1.0
   */
  def nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column = withExpr {
    NthValue(e.expr, Literal(offset), ignoreNulls)
  }

  /**
   * 窗口函数:返回值是窗口的偏移第 offset 行(从1开始计数),如果窗口的大小小于偏移行,则返回null 。
   * 这相当于 SQL 中的 nth_value 函数。
   *
   * @group window_funcs
   * @since 3.1.0
   */
  def nth_value(e: Column, offset: Int): Column = withExpr {
    NthValue(e.expr, Literal(offset), false)
  }

用法

========== df.select(nth_value($"key", 2, true).over(w)) ==========
+---------------------------------------------------------------------------------------------------------------------------------------+
|nth_value(key, 2) ignore nulls OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+---------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                      1|
|                                                                                                                                      1|
|                                                                                                                                      2|
|                                                                                                                                      2|
+---------------------------------------------------------------------------------------------------------------------------------------+

========== df.select(nth_value($"key", 2).over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------------+
|nth_value(key, 2) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                         1|
|                                                                                                                         1|
|                                                                                                                         2|
|                                                                                                                         2|
+--------------------------------------------------------------------------------------------------------------------------+

ntile

  /**
   * 窗口函数:返回有序窗口分区中的 ntile group id(从 1 到n包括在内)。 
   * 例如,如果n为 4,则行的第一季度将获得值 1,第二季度将获得 2,第三季度将获得 3,最后一个季度将获得
   * 4。
   * 这相当于 SQL 中的 NTILE 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def ntile(n: Int): Column = withExpr { new NTile(Literal(n)) }

用法

========== df.select(ntile(2).over(w)). ==========
+----------------------------------------------------------------------------------------------------------------+
|ntile(2) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               1|
|                                                                                                               2|
|                                                                                                               1|
|                                                                                                               2|
+----------------------------------------------------------------------------------------------------------------+

percent_rank

  /**
   * 窗口函数:返回窗口分区内行的相对排名(即百分位数)。
   * 这是通过以下方式计算的:
   * {{{
   *   (分区中行的排名 - 1) / (分区中的行数 - 1)
   * }}}
   * 这相当于 SQL 中的 PERCENT_RANK 函数。
   *
   * @group window_funcs
   * @since 1.6.0
   */
  def percent_rank(): Column = withExpr { new PercentRank }

用法

========== df.select(percent_rank().over(w)) ==========
+----------------------------------------------------------------------------------------------------------------------+
|PERCENT_RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+----------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   0.0|
|                                                                                                                   0.0|
|                                                                                                                   0.0|
|                                                                                                                   0.0|
+----------------------------------------------------------------------------------------------------------------------+

rank

  /**
   * 窗口函数:返回窗口分区内行的排名。
   * rank 和dense_rank 之间的区别在于dense_rank 在有平局时不会在排名序列中留下间隙。 
   * 也就是说,如果您使用dense_rank 对比赛进行排名,并且有三个人并列第二,您会说三个人都排在第二位,下
   * 个人排在第三位。 
   * Rank 会给我连续的数字,使排在第三位的人(在并列之后)登记为第五位。
   * 这相当于 SQL 中的 RANK 函数。
   *
   * @group window_funcs
   * @since 1.4.0
   */
  def rank(): Column = withExpr { new Rank }

用法

========== df.select(rank().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------+
|RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------+
|                                                                                                             1|
|                                                                                                             1|
|                                                                                                             1|
|                                                                                                             1|
+--------------------------------------------------------------------------------------------------------------+

row_number

  /**
   * 窗口函数:返回窗口分区内从 1 开始的序列号。
   *
   * @group window_funcs
   * @since 1.6.0
   */
  def row_number(): Column = withExpr { RowNumber() }

用法

========== df.select(row_number().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|row_number() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   1|
|                                                                                                                   2|
|                                                                                                                   1|
|                                                                                                                   2|
+--------------------------------------------------------------------------------------------------------------------+

实践

代码

package com.shockang.study.spark.sql.functions

import com.shockang.study.spark.util.Utils.formatPrint
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._

/**
 *
 * @author Shockang
 */
object WindowFunctionsExample {

  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.OFF)
    val spark = SparkSession.builder().appName("AggregateFunctionsExample").master("local[*]").getOrCreate()

    import spark.implicits._

    val df = Seq((1, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value").cache()
    val w = Window.partitionBy("value").orderBy("key")

    // cume_dist
    formatPrint("""df.select(cume_dist().over(w))""")
    df.select(cume_dist().over(w)).show()

    // dense_rank
    formatPrint("""df.select(dense_rank().over(w))""")
    df.select(dense_rank().over(w)).show()

    // lag/lead
    formatPrint("""df.select(lag($"key", 2).over(w))""")
    df.select(lag($"key", 2).over(w)).show()

    formatPrint("""df.select(lag("key", 2).over(w))""")
    df.select(lag("key", 2).over(w)).show()

    formatPrint("""df.select(lag($"key", 2, "0").over(w))""")
    df.select(lag($"key", 2, "0").over(w)).show()

    formatPrint("""df.select(lag("key", 2, "0").over(w))""")
    df.select(lag("key", 2, "0").over(w)).show()

    formatPrint("""df.select(lag($"key", 2, "0", true).over(w))""")
    df.select(lag($"key", 2, "0", true).over(w)).show()

    formatPrint("""df.select(lead($"key", 2).over(w))""")
    df.select(lead($"key", 2).over(w)).show()

    formatPrint("""df.select(lead("key", 2).over(w))""")
    df.select(lead("key", 2).over(w)).show()

    formatPrint("""df.select(lead($"key", 2, "0").over(w))""")
    df.select(lead($"key", 2, "0").over(w)).show()

    formatPrint("""df.select(lead("key", 2, "0").over(w))""")
    df.select(lead("key", 2, "0").over(w)).show()

    formatPrint("""df.select(lead($"key", 2, "0", true).over(w))""")
    df.select(lead($"key", 2, "0", true).over(w)).show()

    // nth_value
    formatPrint("""df.select(nth_value($"key", 2, true).over(w))""")
    df.select(nth_value($"key", 2, true).over(w)).show()

    formatPrint("""df.select(nth_value($"key", 2).over(w))""")
    df.select(nth_value($"key", 2).over(w)).show()

    // ntile
    formatPrint("""df.select(ntile(2).over(w)).""")
    df.select(ntile(2).over(w)).show()

    // percent_rank
    formatPrint("""df.select(percent_rank().over(w))""")
    df.select(percent_rank().over(w)).show()

    // rank
    formatPrint("""df.select(rank().over(w))""")
    df.select(rank().over(w)).show()

    // row_number
    formatPrint("""df.select(row_number().over(w))""")
    df.select(row_number().over(w)).show()
  }
}

输出

========== df.select(cume_dist().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|cume_dist() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                 1.0|
|                                                                                                                 1.0|
|                                                                                                                 1.0|
|                                                                                                                 1.0|
+--------------------------------------------------------------------------------------------------------------------+

========== df.select(dense_rank().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|DENSE_RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   1|
|                                                                                                                   1|
|                                                                                                                   1|
|                                                                                                                   1|
+--------------------------------------------------------------------------------------------------------------------+

========== df.select(lag($"key", 2).over(w)) ==========
+-------------------------------------------------------------------------------------------------------------------+
|lag(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
+-------------------------------------------------------------------------------------------------------------------+

========== df.select(lag("key", 2).over(w)) ==========
+-------------------------------------------------------------------------------------------------------------------+
|lag(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
|                                                                                                               null|
+-------------------------------------------------------------------------------------------------------------------+

========== df.select(lag($"key", 2, "0").over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lag("key", 2, "0").over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lag($"key", 2, "0", true).over(w)) ==========
+----------------------------------------------------------------------------------------------------------------+
|lag(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN -2 FOLLOWING AND -2 FOLLOWING)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
|                                                                                                               0|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2).over(w)) ==========
+------------------------------------------------------------------------------------------------------------------+
|lead(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
+------------------------------------------------------------------------------------------------------------------+

========== df.select(lead("key", 2).over(w)) ==========
+------------------------------------------------------------------------------------------------------------------+
|lead(key, 2, NULL) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
|                                                                                                              null|
+------------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2, "0").over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

========== df.select(lead("key", 2, "0").over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

========== df.select(lead($"key", 2, "0", true).over(w)) ==========
+---------------------------------------------------------------------------------------------------------------+
|lead(key, 2, 0) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING)|
+---------------------------------------------------------------------------------------------------------------+
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
|                                                                                                              0|
+---------------------------------------------------------------------------------------------------------------+

========== df.select(nth_value($"key", 2, true).over(w)) ==========
+---------------------------------------------------------------------------------------------------------------------------------------+
|nth_value(key, 2) ignore nulls OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+---------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                      1|
|                                                                                                                                      1|
|                                                                                                                                      2|
|                                                                                                                                      2|
+---------------------------------------------------------------------------------------------------------------------------------------+

========== df.select(nth_value($"key", 2).over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------------+
|nth_value(key, 2) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                         1|
|                                                                                                                         1|
|                                                                                                                         2|
|                                                                                                                         2|
+--------------------------------------------------------------------------------------------------------------------------+

========== df.select(ntile(2).over(w)). ==========
+----------------------------------------------------------------------------------------------------------------+
|ntile(2) OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+----------------------------------------------------------------------------------------------------------------+
|                                                                                                               1|
|                                                                                                               2|
|                                                                                                               1|
|                                                                                                               2|
+----------------------------------------------------------------------------------------------------------------+

========== df.select(percent_rank().over(w)) ==========
+----------------------------------------------------------------------------------------------------------------------+
|PERCENT_RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+----------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   0.0|
|                                                                                                                   0.0|
|                                                                                                                   0.0|
|                                                                                                                   0.0|
+----------------------------------------------------------------------------------------------------------------------+

========== df.select(rank().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------+
|RANK() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------+
|                                                                                                             1|
|                                                                                                             1|
|                                                                                                             1|
|                                                                                                             1|
+--------------------------------------------------------------------------------------------------------------+

========== df.select(row_number().over(w)) ==========
+--------------------------------------------------------------------------------------------------------------------+
|row_number() OVER (PARTITION BY value ORDER BY key ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+--------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   1|
|                                                                                                                   2|
|                                                                                                                   1|
|                                                                                                                   2|
+--------------------------------------------------------------------------------------------------------------------+
  • 3
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值