前言
本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系
目录
Spark SQL functions.scala 源码解析(一)Sort functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(二)Aggregate functions(基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(三)Window functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(四)Non-aggregate functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(五)Math Functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(六)Misc functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(七)String functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(八)DateTime functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(九)Collection functions (基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(十)Partition transform functions(基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(十一)Scala UDF functions(基于 Spark 3.3.0)
Spark SQL functions.scala 源码解析(十二)Java UDF functions(基于 Spark 3.3.0)
正文
asc
  /**
   * 根据列的升序返回排序表达式。
   * 
   * {{{
   *   df.sort(asc("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 1.3.0
   */
  def asc(columnName: String): Column = Column(columnName).asc
  /**
   * 根据列的升序返回排序表达式,空值在非空值之前返回。
   * 
   * {{{
   *   df.sort(asc_nulls_first("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def asc_nulls_first(columnName: String): Column = Column(columnName).asc_nulls_first
  /**
   * 根据列的升序返回排序表达式,空值在非空值之后返回。
   * 
   * {{{
   *   df.sort(asc_nulls_last("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def asc_nulls_last(columnName: String): Column = Column(columnName).asc_nulls_last
 
用法
========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc_nulls_first("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc_nulls_last("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
|  David|null|  35|
+-------+----+----+
 
desc
  /**
   * 根据列的降序返回排序表达式。
   * 
   * {{{
   *   df.sort(asc("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 1.3.0
   */
  def desc(columnName: String): Column = Column(columnName).desc
  /**
   * 根据列的降序返回排序表达式,空值显示在非空值之前。
   * 
   * {{{
   *   df.sort(asc("dept"), desc_nulls_first("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def desc_nulls_first(columnName: String): Column = Column(columnName).desc_nulls_first
  /**
   * 根据列的降序返回排序表达式,空值显示在非空值之后。
   * 
   * {{{
   *   df.sort(asc("dept"), desc_nulls_last("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def desc_nulls_last(columnName: String): Column = Column(columnName).desc_nulls_last
 
用法
========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc("dept"), desc_nulls_first("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Joan|  PD|null|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
+-------+----+----+
========== [ df.sort(asc("dept"), desc_nulls_last("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
 
实践
数据
employees.csv
Alan,PD,30
Bob,HR,28
Bruce,PD,24
Charles,ED,34
David,,35
Joan,PD,
 
代码
package com.shockang.study.spark.sql.functions
import com.shockang.study.spark.util.Utils.formatPrint
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
/**
 *
 * @author Shockang
 */
object SortFunctionsExample {
  val DATA_PATH = "/Users/shockang/code/spark-examples/data/simple/read/employees.csv"
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.OFF)
    val spark = SparkSession.builder().appName("SortFunctionsExample").master("local[*]").getOrCreate()
    val df = spark.read.csv(DATA_PATH).toDF("name", "dept", "age").cache()
    // asc
    formatPrint("""df.sort(asc("dept"), desc("age"))""")
    df.sort(asc("dept"), desc("age")).show()
    formatPrint("""df.sort(asc_nulls_first("dept"), desc("age"))""")
    df.sort(asc_nulls_first("dept"), desc("age")).show()
    formatPrint("""df.sort(asc_nulls_last("dept"), desc("age"))""")
    df.sort(asc_nulls_last("dept"), desc("age")).show()
    //desc
    formatPrint("""df.sort(asc("dept"), desc("age"))""")
    df.sort(asc("dept"), desc("age")).show()
    formatPrint("""df.sort(asc("dept"), desc_nulls_first("age"))""")
    df.sort(asc("dept"), desc_nulls_first("age")).show()
    formatPrint("""df.sort(asc("dept"), desc_nulls_last("age"))""")
    df.sort(asc("dept"), desc_nulls_last("age")).show()
  }
}
 
输出
========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc_nulls_first("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc_nulls_last("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
|  David|null|  35|
+-------+----+----+
========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
========== [ df.sort(asc("dept"), desc_nulls_first("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Joan|  PD|null|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
+-------+----+----+
========== [ df.sort(asc("dept"), desc_nulls_last("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+
                
Spark SQL 函数解析:排序与聚合函数
        
                  
                  
                  
                  
                            
本文详细解析了Spark SQL中用于排序(包括`asc`、`desc`、`asc_nulls_first`、`asc_nulls_last`等)和聚合的函数,通过实例展示了它们在数据处理中的应用,帮助理解Spark SQL如何处理数据排序和聚合操作。
          
                    
      
          
                
                
                
                
              
                
                
                
                
                
              
                
                
              
            
                  
					1319
					
被折叠的  条评论
		 为什么被折叠?
		 
		 
		
    
  
    
  
            


            