java面试题网站:www.javaoffers.com
countApproxDistinctByKey(double)该函数根据精确度double,来计算相同key的大约个数。
demo:
val a = sc.parallelize(List("wang","li","cao","zou"),2);
val b = sc.parallelize(a.takeSample(true,1000,0)) //随机抽取1000个样本
val c = sc.parallelize(1 to b.count.toInt)
val d = b.zip(c)
测试1:
d.countApproxDistinctByKey(0.1).collect //计算相同可以得大约个数
输出结果为:
Array[(String, Long)] = Array((cao,286), (li,253), (zou,280), (wang,193))
测试2:
d.countApproxDistinctByKey(0.2).collect //计算相同可以得大约个数
输出结果为:
Array[(String, Long)] = Array((cao,291), (li,308), (zou,214), (wang,220))