【云星数据---Apache Flink实战系列(精品版)】:Apache Flink批处理API详解与编程实战007--DateSet实用API详解007

183 篇文章 0 订阅
86 篇文章 57 订阅

DateSet的API详解七

sortGroup

Adds a secondary sort key to this GroupedDataSet. This will only have an effect if you use one
of the group-at-a-time, i.e. reduceGroup.

执行程序:

//1.创建 DataSet[(Int, String)]
val input: DataSet[(Int, String)] = benv.fromElements(
(20,"zhangsan"),
(22,"zhangsan"),
(22,"lisi"),
(22,"lisi"),
(22,"lisi"),
(18,"zhangsan"),
(18,"zhangsan"))

//2.用int分组,用int对分组进行排序
val sortdata = input.groupBy(0).sortGroup(0, Order.ASCENDING)

//3.对排序好的分组进行reduceGroup
val outputdata =sortdata.reduceGroup {
      //将相同的元素用set去重
      (in, out: Collector[(Int, String)]) =>
        in.toSet foreach (out.collect)
}
//4.显示结果
outputdata.collect

执行结果:

res25: Seq[(Int, String)] = Buffer((18,zhangsan), (20,zhangsan), (22,zhangsan), (22,lisi))

web ui中的执行效果:
这里写图片描述

minBy

def minBy(fields: Int*): DataSet[T]

Applies a special case of a reduce transformation minBy on a grouped DataSet. 

在分组后的数据中,获取每组最小的元素。

执行程序:

//1.定义case class
case class Student(age: Int, name: String,height:Double)

//2.创建DataSet[Student]
val input: DataSet[Student] = benv.fromElements(
Student(16,"zhangasn",194.5),
Student(17,"zhangasn",184.5),
Student(18,"zhangasn",174.5),
Student(16,"lisi",194.5),
Student(17,"lisi",184.5),
Student(18,"lisi",174.5))

//3.以name进行分组,获取age最小的元素
val output0: DataSet[Student] = input.groupBy(_.name).minBy(0)
output0.collect

//4.以name进行分组,获取height和age最小的元素
val output1: DataSet[Student] = input.groupBy(_.name).minBy(2,0)
output1.collect

执行结果:

Scala-Flink> output0.collect
res73: Seq[Student] = Buffer(Student(16,lisi,194.5), Student(16,zhangasn,194.5))

Scala-Flink> output1.collect
res74: Seq[Student] = Buffer(Student(18,lisi,174.5), Student(18,zhangasn,174.5))

web ui中的执行效果:
这里写图片描述

maxBy

def maxBy(fields: Int*): DataSet[T]
def max(field: Int): AggregateDataSet[T]

Applies a special case of a reduce transformation maxBy on a grouped DataSet 

在分组后的数据中,获取每组最大的元素。

执行程序:

//1.定义case class
case class Student(age: Int, name: String,height:Double)

//2.创建DataSet[Student]
val input: DataSet[Student] = benv.fromElements(
Student(16,"zhangasn",194.5),
Student(17,"zhangasn",184.5),
Student(18,"zhangasn",174.5),
Student(16,"lisi",194.5),
Student(17,"lisi",184.5),
Student(18,"lisi",174.5))

//3.以name进行分组,获取age最大的元素
val output0: DataSet[Student] = input.groupBy(_.name).maxBy(0)
output0.collect

//4.以name进行分组,获取height和age最大的元素
val output1: DataSet[Student] = input.groupBy(_.name).maxBy(2,0)
output1.collect

执行结果:

Scala-Flink> output0.collect
res75: Seq[Student] = Buffer(Student(18,lisi,174.5), Student(18,zhangasn,174.5))

Scala-Flink> output1.collect
res76: Seq[Student] = Buffer(Student(16,lisi,194.5), Student(16,zhangasn,194.5))

web ui中的执行效果:
这里写图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值