一、过滤算子filter(filter等价于where算子)
DF.col("id")等价于$"id",取列ColumnName
DF.filter("name=''") 过滤name等于空的行
DF.filter($"age" > 21).show() 过滤age大于21的行,必须增加语句:import spark.implicits._,否则$表达式会报错
DF.filter($"age" === 21) 取等于时必须用===,否则报错,对应的不等于是=!=。等价于DF.filter("age=21")
DF.filter("substring(name,0,1) = 'M'").show 显示name以M开头的行,其中substring是functions.scala,functions.scala包含很多函数方法,等价于DF.filter("substr(name,0,1) = 'M'").show
scala> peopleDF.printSchema
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
|-- address: string (nullable = true)
scala> peopleDF.show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
|zhangsan| 22| chengdu|
| wangwu| 33| beijing|
| lisi| 28|shanghai|
+--------+---+--------+
scala> peopleDF.filter($"name" === "wangwu").show
+------+---+-------+
| name|age|address|
+------+---+-------+
|wangwu| 33|beijing|
+------+---+-------+
scala> peopleDF.filter($"name" =!= "wangwu").show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
|zhangsan| 22| chengdu|
| lisi| 28|shanghai|
+--------+---+--------+
scala> peopleDF.filter("age > 30").show
+------+---+-------+
| name|age|address|
+------+---+-------+
|wangwu| 33|beijing|
+------+---+-------+
scala> peopleDF.filter($"age" > 30).show
+------+---+-------+
| name|age|address|
+------+---+-------+
|wangwu| 33|beijing|
+------+---+-------+
二、排序算子sort(sort等价于orderBy)
DF.sort(DF.col("id").desc).show 以DF中字段id降序,指定升降序的方法。另外可指定多个字段排序
=DF.sort($"id".desc).show
DF.sort 等价于DF.orderBy
scala> peopleDF.sort($"age").show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
|zhangsan| 22| chengdu|
| lisi| 28|shanghai|
| wangwu| 33| beijing|
+--------+---+--------+
scala> peopleDF.sort($"age".desc).show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
| wangwu| 33| beijing|
| lisi| 28|shanghai|
|zhangsan| 22| chengdu|
+--------+---+--------+
scala> peopleDF.sort($"age".asc).show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
|zhangsan| 22| chengdu|
| lisi| 28|shanghai|
| wangwu| 33| beijing|
+--------+---+--------+
scala> peopleDF.orderBy($"age".asc).show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
|zhangsan| 22| chengdu|
| lisi| 28|shanghai|
| wangwu| 33| beijing|
+--------+---+--------+
scala> peopleDF.orderBy($"age".desc).show
+--------+---+--------+
| name|age| address|
+--------+---+--------+
| wangwu| 33| beijing|
| lisi| 28|shanghai|
|zhangsan| 22| chengdu|
+--------+---+--------+