java spark 求平均值,Spark DataFrame：计算按行均值（或任何聚合操作）

最新推荐文章于 2023-01-03 23:42:05 发布

weixin_39640767

最新推荐文章于 2023-01-03 23:42:05 发布

阅读量1.2k

点赞数

文章标签： java spark 求平均值

I have a Spark DataFrame loaded up in memory, and I want to take the mean (or any aggregate operation) over the columns. How would I do that? (In numpy, this is known as taking an operation over axis=1).

If one were calculating the mean of the DataFrame down the rows (axis=0), then this is already built in:

from pyspark.sql import functions as F

F.mean(...)

But is there a way to programmatically do this against the entries in the columns? For example, from the DataFrame below

+--+--+---+---+

|id|US| UK|Can|

+--+--+---+---+

| 1|50| 0| 0|

| 1| 0|100| 0|

| 1| 0| 0|125|

| 2|75| 0| 0|

+--+--+---+---+

Omitting id, the means would be

+------+

| mean|

+------+

| 16.66|

| 3

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39640767

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java spark 求平均值,Spark DataFrame：计算按行均值（或任何聚合操作）

I have a Spark DataFrame loaded up in memory, and I want to take the mean (or any aggregate operation) over the columns. How would I do that? (In numpy, this is known as taking an operation over axis=...
复制链接

扫一扫