DateSet的API详解六
ReduceGroup
def reduceGroup[R](fun:(Iterator[T])⇒R)(implicit arg0:TypeInformation[R],arg1:ClassTag[R]):DataSet[R]
def reduceGroup[R](fun: (Iterator[T], Collector[R]) ⇒
Unit(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]
def reduceGroup[R](reducer: GroupReduceFunction[T, R])
(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]
Creates a new DataSet by passing all elements in this DataSet to the group reduce function.
此函数和reduce函数类似,不过它每次处理一个grop而非一个元素。
ReduceGroup示例一,操作tuple
执行程序:
//1.定义 DataSet[(Int, String)]
val input: DataSet[(Int, String)] = benv.fromElements(
(20,"zhangsan"),(22,"zhangsan"),
(22,"lisi"),(20,"zhangsan"))
//2.先用string分组,然后对分组进行reduceGroup
val output = input.groupBy(1).reduceGroup {
//将相同的元素用set去重
(in, out: Collector[(Int, String)]) =>
in.toSet foreach (out.collect)
}
//3.显示结果
output.collect
执行结果:
res14: Seq[(Int, String)] = Buffer((22,lisi), (20,zhangsan), (22,zhangsan))
web ui中的执行效果:
ReduceGroup示例二,操作case class
//1.定义case class
case class Student(age: Int, name: String)
//2.创建DataSet[Student]
val input: DataSet[Student] = benv.fromElements(
Student(20,"zhangsan"),
Student(22,"zhangsan"),
Student(22,"lisi"),
Student(20,"zhangsan"))
//3.以age进行分组,然后对分组进行reduceGroup
val output = input.groupBy(_.age).reduceGroup {
//将相同的元素用set去重
(in, out: Collector[Student]) =>
in.toSet foreach (out.collect)
}
//4.显示结果
output.collect
执行结果:
res16: Seq[Student] = Buffer(Student(20,zhangsan), Student(22,zhangsan), Student(22,lisi))