//combineBykey既能实现distinct,也能实现groupby,原因见下: val rdd2: RDD[(String, Set[String])] = rdd1. map(x => (x.phone_no + x.wifi, x.lat + split + x.lng)). partitionBy(wifi_part). combineByKey( (it: String) => Set(it), (curS: Set[String], it: String) => curS + it, (curS1: Set[String], curS2: Set[String]) => curS1 ++ curS2 ) //说明,先用partitionBy将同一key值的数据刷新到同一partition内,再用combineByKey实现聚合,聚合时用set完成去重。
CombineBykey实现distinct和groupby
最新推荐文章于 2023-01-01 14:13:17 发布