python spark 求解最大最小平均

最新推荐文章于 2020-12-08 09:13:37 发布

djph26741

最新推荐文章于 2020-12-08 09:13:37 发布

阅读量106

点赞数

文章标签：大数据 python

原文链接：http://www.cnblogs.com/bonelee/p/7154042.html

版权

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

1	LOGGER.info("Mean: " + rdd.mean());

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

1	StatCounter statCounter = rdd.stats();

Now results can be accessed as follows,

LOGGER.info("Count: " + statCounter.count());

LOGGER.info("Min: " + statCounter.min());

LOGGER.info("Max: " + statCounter.max());

LOGGER.info("Sum: " + statCounter.sum());

LOGGER.info("Mean: " + statCounter.mean());

LOGGER.info("Variance: " + statCounter.variance());

LOGGER.info("Stdev: " + statCounter.stdev());

摘自：http://www.sparkexpert.com/tag/rdd/

转载于:https://www.cnblogs.com/bonelee/p/7154042.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

djph26741

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python spark 求解最大最小平均

rdd=sc.parallelizeDoubles(testData);Now we’ll calculate the mean of our dataset.1LOGGER.info("Mean: "+rdd.mean());There ...
复制链接

扫一扫