java变量组_如何按组对变量求和?

rcs提供的答案很简单 . 但是,如果您正在处理更大的数据集并需要提高性能,则可以采用更快的替代方法:

library(data.table)

data = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"),

Frequency=c(10,15,5,2,14,20,3))

data[, sum(Frequency), by = Category]

# Category V1

# 1: First 30

# 2: Second 5

# 3: Third 34

system.time(data[, sum(Frequency), by = Category] )

# user system elapsed

# 0.008 0.001 0.009

让我们使用data.frame和上面的内容将它与同一个东西进行比较:

data = data.frame(Category=c("First","First","First","Second","Third", "Third", "Second"),

Frequency=c(10,15,5,2,14,20,3))

system.time(aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum))

# user system elapsed

# 0.008 0.000 0.015

如果你想保留列,这就是语法:

data[,list(Frequency=sum(Frequency)),by=Category]

# Category Frequency

# 1: First 30

# 2: Second 5

# 3: Third 34

对于较大的数据集,差异将变得更加明显,如下面的代码所示:

data = data.table(Category=rep(c("First", "Second", "Third"), 100000),

Frequency=rnorm(100000))

system.time( data[,sum(Frequency),by=Category] )

# user system elapsed

# 0.055 0.004 0.059

data = data.frame(Category=rep(c("First", "Second", "Third"), 100000),

Frequency=rnorm(100000))

system.time( aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum) )

# user system elapsed

# 0.287 0.010 0.296

对于多个聚合,您可以按如下方式组合 lapply 和 .SD

data[, lapply(.SD, sum), by = Category]

# Category Frequency

# 1: First 30

# 2: Second 5

# 3: Third 34

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值