DateSet的API详解八
distinct
def distinct(firstField: String, otherFields: String*): DataSet[T]
def distinct(fields: Int*): DataSet[T]
def distinct(): DataSet[T]
def distinct[K](fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]
Creates a new DataSet containing the distinct elements of this DataSet.
对DataSet中的元素进行去重。
distinct示例一,单一项目的去重
执行程序:
//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("lisi","zhangsan", "lisi","wangwu")
//2.元素去重
val result=input.distinct()
//3.显示结果
result.collect
执行结果:
res52: Seq[String] = Buffer(lisi, wangwu, zhangsan)
web ui中的执行效果:
distinct示例二,多项目的去重,不指定比较项目,默认是全部比较
执行程序:
//1.创建DataSet[(Int, String, Double)]
val input: DataSet[(Int, String, Double)] = benv.fromElements(
(2,"zhagnsan",1654.5),(3,"lisi",2347.8),(2,"zhagnsan",1654.5),
(4,"wangwu",1478.9),(5,"zhaoliu",987.3),(2,"zhagnsan",1654.0))
//2.元素去重
val output = input.distinct()
//3.显示结果
output.collect
执行结果:
res53: Seq[(Int, String, Double)] = Buffer(
(2,zhagnsan,1654.0),
(2,zhagnsan,1654.5),
(3,lisi,2347.8),
(4,wangwu,1478.9),
(5,zhaoliu,987.3))
distinct示例三,多项目的去重,指定比较项目
执行程序:
//1.创建DataSet[(Int, String, Double)]
val input: DataSet[(Int, String, Double)] = benv.fromElements(
(2,"zhagnsan",1654.5),(3,"lisi",2347.8),(2,"zhagnsan",1654.5),
(4,"wangwu",1478.9),(5,"zhaoliu",987.3),(2,"zhagnsan",1654.0))
//2.元素去重:指定比较第0和第1号元素
val output = input.distinct(0,1)
//3.显示结果
output.collect
执行结果:
res54: Seq[(Int, String, Double)] = Buffer(
(2,zhagnsan,1654.5),
(3,lisi,2347.8),
(4,wangwu,1478.9),
(5,zhaoliu,987.3))
distinct示例四,case class的去重,指定比较项目
执行程序:
//1.创建case class Student
case class Student(name : String, age : Int)
//2.创建DataSet[Student]
val input: DataSet[Student] = benv.fromElements(
Student("zhangsan",24),Student("zhangsan",24),Student("zhangsan",25),
Student("lisi",24),Student("wangwu",24),Student("lisi",25))
//3.去掉age重复的元素
val age_r = input.distinct("age")
age_r.collect
//4.去掉name重复的元素
val name_r = input.distinct("name")
name_r.collect
//5.去掉name和age重复的元素
val all_r = input.distinct("age","name")
all_r.collect
//6.去掉name和age重复的元素
val all = input.distinct()
all.collect
//7.去掉name和age重复的元素
val all0 = input.distinct("_")
all0.collect
程序解析:
Scala-Flink> age_r.collect
res38: Seq[Student] = Buffer(Student(zhangsan,24), Student(zhangsan,25))
Scala-Flink> name_r.collect
res39: Seq[Student] = Buffer(Student(lisi,24),Student(wangwu,24),Student(zhangsan,24))
Scala-Flink> all_r.collect
res40: Seq[Student] = Buffer(Student(lisi,24), Student(lisi,25), Student(wangwu,24),
Student(zhangsan,24), Student(zhangsan,25))
Scala-Flink> all.collect
res41: Seq[Student] = Buffer(Student(lisi,24), Student(lisi,25), Student(wangwu,24),
Student(zhangsan,24), Student(zhangsan,25))
Scala-Flink> all0.collect
res47: Seq[Student] = Buffer(Student(lisi,24), Student(lisi,25), Student(wangwu,24),
Student(zhangsan,24), Student(zhangsan,25))
web ui中的执行效果:
distinct示例五,根据表达式进行去重
执行程序:
//1.创建DataSet[Int]
val input: DataSet[Int] = benv.fromElements(3,-3,4,-4,6,-5,7)
//2.根据表达式,本例中是根据元素的绝对值进行元素去重
val output = input.distinct {x => Math.abs(x)}
//3.显示结果
output.collect
执行结果:
res55: Seq[Int] = Buffer(3, 4, -5, 6, 7)