SparkSQL 字符串函数

SparkSQL 字符串函数

1.样例类的定义

case class Student(name:String,subject:String,score:String)

2.测试数据展示

val spark: SparkSession = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
//导入隐式包
import spark.implicits._

val df= spark.sparkContext.textFile("file:///E:\\MyEclipse\\java\\myfactory\\data\\data.txt").map(line =>{
   val arr: Array[String] = line.split(",")
   Student(arr(0),arr(1),arr(2))
   }).toDF()
    
 df.show(false)
 spark.close()
+--------+-------+-----+
|name    |subject|score|
+--------+-------+-----+
|zhangsan|语文     |90   |
|zhangsan|数学     |85   |
|zhangsan|英语     |85   |
|zhangsan|物理     |95   |
|lisi    |语文     |90   |
|lisi    |化学     |100  |
|wangwu  |物理     |95   |
|wangwu  |化学     |95   |
+--------+-------+-----+

3.collect_set和collect_list

将同组的的字段合并成集合,collect_list(不去重),collect_set(去重)
val df1: DataFrame = df.groupBy("name").agg(collect_list("subject").alias("subjects"),collect_set($"score").alias("scores"))
  .select("name", "subjects","scores")
    
df1.show(false)
+--------+----------------+------------+
|name    |subjects        |scores      |
+--------+----------------+------------+
|wangwu  |[物理, 化学]        |[95]        |
|zhangsan|[语文, 数学, 英语, 物理]|[95, 85, 90]|
|lisi    |[语文, 化学]        |[100, 90]   |
+--------+----------------+------------+

4.concat_ws

作用1:将字段值为集合的值,按照指定符号进行拼接(集合转字符串)
val df2: DataFrame = df1.withColumn("all_subjects", concat_ws("-", $"subjects"))
  .withColumn("all_scores",concat_ws("-",$"scores"))

df2.show(false)
+--------+----------------+------------+------------+----------+
|name    |subjects        |scores      |all_subjects|all_scores|
+--------+----------------+------------+------------+----------+
|wangwu  |[物理, 化学]        |[95]        |物理-化学       |95        |
|zhangsan|[语文, 数学, 英语, 物理]|[95, 85, 90]|语文-数学-英语-物理 |95-85-90  |
|lisi    |[语文, 化学]        |[100, 90]   |语文-化学       |100-90    |
+--------+----------------+------------+------------+----------+
作用2:将字段按照指定的字符进行拼接(字符串拼接)
val test: DataFrame = df.withColumn("test", concat_ws("-", $"subject", $"score"))

test.show(false)
+--------+-------+-----+------+
|    name|subject|score|  test|
+--------+-------+-----+------+
|zhangsan|     语文|   90| 语文-90|
|zhangsan|     数学|   85| 数学-85|
|zhangsan|     英语|   85| 英语-85|
|zhangsan|     物理|   95| 物理-95|
|    lisi|     语文|   90| 语文-90|
|    lisi|     化学|  100|化学-100|
|  wangwu|     物理|   95| 物理-95|
|  wangwu|     化学|   95| 化学-95|
+--------+-------+-----+------+

5.concat

将多个字段进行拼接
val df3: DataFrame = df.withColumn("info", concat($"name",$"subject", $"score"))

df3.show(false)
+--------+-------+-----+------------+
|name    |subject|score|info        |
+--------+-------+-----+------------+
|zhangsan|语文     |90   |zhangsan语文90|
|zhangsan|数学     |85   |zhangsan数学85|
|zhangsan|英语     |85   |zhangsan英语85|
|zhangsan|物理     |95   |zhangsan物理95|
|lisi    |语文     |90   |lisi语文90    |
|lisi    |化学     |100  |lisi化学100   |
|wangwu  |物理     |95   |wangwu物理95  |
|wangwu  |化学     |95   |wangwu化学95  |
+--------+-------+-----+------------+
©️2020 CSDN 皮肤主题: 创作都市 设计师:CSDN官方博客 返回首页