groupByKey
不在map端聚合 直接到reduce端聚合
可以 传分区器和分区数量
val list: Seq[(String, Int)] = List((“spark”, 6), (“spark”, 3), (“flink”, 7), (“hadoop”, 2), (“hadoop”, 8), (“spark”, 2), (“flume”, 9))
List((flink,CompactBuffer(7)), (spark,CompactBuffer(6, 3, 2)), (hadoop,CompactBuffer(2, 8)), (flume,CompactBuffer(9)))
groupBy
里面多装了一个key
底层调用 groupByKey方法
list2 = list.map(t=>(t._1,t))
然后将list2 groupByKey
distinct
可以用reduceByKey实现