Spark的Dataset操作(二)-过滤的filter和where

scala> val df = spark.createDataset(Seq(
  ("aaa",1,2),("bbb",3,4),("ccc",3,5),("bbb",4, 6))   ).toDF("key1","key2","key3")
df: org.apache.spark.sql.DataFrame = [key1: string, key2: int ... 1 more field]

scala> df.show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
| bbb|   3|   4|
| ccc|   3|   5|
| bbb|   4|   6|
+----+----+----+

    scala> val df = spark.createDataset(Seq(  

("aaa",1,2),("bbb",3,4),("ccc",3,5),("bbb",4, 6))   ).toDF("key1","key2","key3")

df: org.apache.spark.sql.DataFrame = [key1: string, key2: int ... 1 more field] scala> df.show
+----+----+----+

|key1|key2|key3|
+----+----+----+

| aaa| 1| 2|

| bbb| 3| 4|

| ccc| 3| 5|

| bbb|   4|   6|
+----+----+----+

filter函数

从Spark官网的文档中看到,filter函数有下面几种形式:

def filter(func: (T) ⇒ Boolean): Dataset[T]
def filter(conditionExpr: String): Dataset[T]
def filter(condition: Column): Dataset[T]

def filter(func: (T) ⇒ Boolean): Dataset[T]
def filter(conditionExpr: String): Dataset[T]
def filter(condition: Column): Dataset[T]

所以,以下几种写法都是可以的:

scala> df.filter($"key1">"aaa").show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| bbb|   3|   4|
| ccc|   3|   5|
| bbb|   4|   6|
+----+----+----+


scala> df.filter($"key1"==="aaa").show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
+----+----+----+

scala> df.filter("key1='aaa'").show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
+----+----+----+

scala> df.filter("key2=1").show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
+----+----+----+

scala> df.filter($"key2"===3).show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| bbb|   3|   4|
| ccc|   3|   5|
+----+----+----+

scala> df.filter($"key2"===$"key3"-1).show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
| bbb|   3|   4|
+----+----+----+

其中, ===是在Column类中定义的函数,对应的不等于是=!=。
$”列名”这个是语法糖,返回Column对象
where函数

scala> df.where("key1 = 'bbb'").show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| bbb|   3|   4|
| bbb|   4|   6|
+----+----+----+


scala> df.where($"key2"=!= 3).show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
| bbb|   4|   6|
+----+----+----+


scala> df.where($"key3">col("key2")).show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| aaa|   1|   2|
| bbb|   3|   4|
| ccc|   3|   5|
| bbb|   4|   6|
+----+----+----+


scala> df.where($"key3">col("key2")+1).show
+----+----+----+
|key1|key2|key3|
+----+----+----+
| ccc|   3|   5|
| bbb|   4|   6|
+----+----+----+
 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值