结论:
reduceby >=distinct >=groupby
1.spark core中,网络传输的数据少
reducebykey>groupby
2.spark sql中的group by 在优化时=reducebykey
distinct 大概率是groupby
所以:待源码验证
结论:
reduceby >=distinct >=groupby
1.spark core中,网络传输的数据少
reducebykey>groupby
2.spark sql中的group by 在优化时=reducebykey
distinct 大概率是groupby
所以:待源码验证