Spark 取每个groupby的N条数据

如果用groupby接口的话,可能OOM,

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{rand, row_number}


val windowFun = Window.partitionBy("groupby_column").orderBy(rand())
val resultDF = dataDF.withColumn("rank", row_number.over(windowFun))
      .filter("rank<=100").map((row: Row) => {
      //...
    })
©️2020 CSDN 皮肤主题: 创作都市 设计师:CSDN官方博客 返回首页