【spark内置函数】常用函数

一、说明

  • spark框架源码functions.scala文件包含了大量的内置函数,尤其在agg函数中会广泛应用。
  • 使用之前要导入import org.apache.spark.sql.functions._ 
  • 这里只列举了一些常用的内置函数,其他的要参考官网。

二、代码示例

package com.dt.spark.Test

import com.dt.spark.Test.AggTest.Student
import org.apache.spark.sql.{DataFrame, SparkSession}

/**
  * 测试spark SQL的内置函数
  */
object FuncTest {

  case class Student(classId: Int, name: String, gender: String, age: Int, birthday: String)

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").appName("FuncTest").getOrCreate()
    import spark.implicits._
    val sc = spark.sparkContext
    sc.setLogLevel("WARN")

    val stuDF: DataFrame = Seq(
      Student(1001, "zhangsan", "F", 20, "1995-12-11 12:12:13"),
      Student(1002, "lisi", "M", 16, "2000-01-14 10:10:57"),
      Student(1003, "wangwu", "M", 21, "1994-05-13 01:12:00"),
      Student(1004, "zhaoliu", "F", 21, "1998-02-03 03:02:48"),
      Student(1004, "zhouqi", "M", 22, "1996-06-04 05:28:19"),
      Student(1001, "qianba", "M", 19, "1997-10-14 08:09:47"),
      Student(1003, "liuliu", "F", 23, "2004-09-14 16:25:30")
    ).toDF()

    import org.apache.spark.sql.functions._
    /**
      * 主要日期函数,支持 date/timestamp/string
      */
    stuDF.withColumn("b01", date_format($"birthday", "yyyy-MM-dd"))
      .withColumn("date_add", date_add(col("birthday"), 3))
      .withColumn("date_sub", date_sub('birthday, 3))
      .withColumn("datediff", datediff('date_add, 'date_sub))
      .withColumn("hour", hour(col("birthday"))) //提取小时
      .withColumn("month", month(col("birthday"))) //提取月
      .withColumn("quarter", quarter(col("birthday"))) //提取季度
      .show()

    /**
      * 字符串函数:
      * initcap(e: Column): 单词首字母大写
      * lower(e: Column): 转小写
      * upper(e: Column): 转大写
      * instr(str: Column, substring: String): substring在str中第一次出现的位置
      * length(e: Column): 字符串长度
      */
    stuDF.withColumn("initcap", initcap($"name"))
      .withColumn("lower", lower('gender))
      .withColumn("upper", upper(col("name")))
      .withColumn("instr", instr(column("classId"), "00"))
      .withColumn("length", length(column("name")))
      .show()

    /**
      * 其他非聚合函数:
      * struct(cols: Column*):多列组合成新的struct column
      * when(condition: Column, value: Any):当condition为true返回value
      * lit(literal: Any):将字面量(literal)创建一个Column
      * isnan(e: Column):如果列为非数字值得特殊值NaN,则返回true
      * isnull(e: Column):如果为空,则返回true
      * array(cols: Column*)多列合并为array,cols必须为同类型
      */
    stuDF.withColumn("struct", struct(col("name"), $"gender"))
      .withColumn("array", array(col("name"), $"gender"))
      .withColumn("when", when($"gender" === "F", 1).
        when(col("gender") === "M", 0)
        .otherwise("2"))
      .withColumn("isnan", isnan($"gender"))
      .withColumn("isnan1", isnan($"age"))
      .withColumn("isnull", isnull($"gender"))
      .withColumn("lit", lit("00"))
      .show()

    /**
      * 聚合函数:
      * avg:平均值
      * collect_list:聚合指定字段的值到list
      * count:计数
      * mean:平均值
      */
    stuDF.agg(avg($"age"), mean($"age"), collect_list($"name"), count($"gender"))
      .show()
  }
}

三、运行结果

+-------+--------+------+---+-------------------+----------+----------+----------+--------+----+-----+-------+
|classId|    name|gender|age|           birthday|       b01|  date_add|  date_sub|datediff|hour|month|quarter|
+-------+--------+------+---+-------------------+----------+----------+----------+--------+----+-----+-------+
|   1001|zhangsan|     F| 20|1995-12-11 12:12:13|1995-12-11|1995-12-14|1995-12-08|       6|  12|   12|      4|
|   1002|    lisi|     M| 16|2000-01-14 10:10:57|2000-01-14|2000-01-17|2000-01-11|       6|  10|    1|      1|
|   1003|  wangwu|     M| 21|1994-05-13 01:12:00|1994-05-13|1994-05-16|1994-05-10|       6|   1|    5|      2|
|   1004| zhaoliu|     F| 21|1998-02-03 03:02:48|1998-02-03|1998-02-06|1998-01-31|       6|   3|    2|      1|
|   1004|  zhouqi|     M| 22|1996-06-04 05:28:19|1996-06-04|1996-06-07|1996-06-01|       6|   5|    6|      2|
|   1001|  qianba|     M| 19|1997-10-14 08:09:47|1997-10-14|1997-10-17|1997-10-11|       6|   8|   10|      4|
|   1003|  liuliu|     F| 23|2004-09-14 16:25:30|2004-09-14|2004-09-17|2004-09-11|       6|  16|    9|      3|
+-------+--------+------+---+-------------------+----------+----------+----------+--------+----+-----+-------+

+-------+--------+------+---+-------------------+--------+-----+--------+-----+------+
|classId|    name|gender|age|           birthday| initcap|lower|   upper|instr|length|
+-------+--------+------+---+-------------------+--------+-----+--------+-----+------+
|   1001|zhangsan|     F| 20|1995-12-11 12:12:13|Zhangsan|    f|ZHANGSAN|    2|     8|
|   1002|    lisi|     M| 16|2000-01-14 10:10:57|    Lisi|    m|    LISI|    2|     4|
|   1003|  wangwu|     M| 21|1994-05-13 01:12:00|  Wangwu|    m|  WANGWU|    2|     6|
|   1004| zhaoliu|     F| 21|1998-02-03 03:02:48| Zhaoliu|    f| ZHAOLIU|    2|     7|
|   1004|  zhouqi|     M| 22|1996-06-04 05:28:19|  Zhouqi|    m|  ZHOUQI|    2|     6|
|   1001|  qianba|     M| 19|1997-10-14 08:09:47|  Qianba|    m|  QIANBA|    2|     6|
|   1003|  liuliu|     F| 23|2004-09-14 16:25:30|  Liuliu|    f|  LIULIU|    2|     6|
+-------+--------+------+---+-------------------+--------+-----+--------+-----+------+

+-------+--------+------+---+-------------------+------------+-------------+----+-----+------+------+---+
|classId|    name|gender|age|           birthday|      struct|        array|when|isnan|isnan1|isnull|lit|
+-------+--------+------+---+-------------------+------------+-------------+----+-----+------+------+---+
|   1001|zhangsan|     F| 20|1995-12-11 12:12:13|[zhangsan,F]|[zhangsan, F]|   1|false| false| false| 00|
|   1002|    lisi|     M| 16|2000-01-14 10:10:57|    [lisi,M]|    [lisi, M]|   0|false| false| false| 00|
|   1003|  wangwu|     M| 21|1994-05-13 01:12:00|  [wangwu,M]|  [wangwu, M]|   0|false| false| false| 00|
|   1004| zhaoliu|     F| 21|1998-02-03 03:02:48| [zhaoliu,F]| [zhaoliu, F]|   1|false| false| false| 00|
|   1004|  zhouqi|     M| 22|1996-06-04 05:28:19|  [zhouqi,M]|  [zhouqi, M]|   0|false| false| false| 00|
|   1001|  qianba|     M| 19|1997-10-14 08:09:47|  [qianba,M]|  [qianba, M]|   0|false| false| false| 00|
|   1003|  liuliu|     F| 23|2004-09-14 16:25:30|  [liuliu,F]|  [liuliu, F]|   1|false| false| false| 00|
+-------+--------+------+---+-------------------+------------+-------------+----+-----+------+------+---+

+------------------+------------------+--------------------+-------------+
|          avg(age)|          avg(age)|  collect_list(name)|count(gender)|
+------------------+------------------+--------------------+-------------+
|20.285714285714285|20.285714285714285|[zhangsan, lisi, ...|            7|
+------------------+------------------+--------------------+-------------+
  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

郝少

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值