Aggregate()函数 分组计算描述性变量
语法如下:
## for class “data.frame”
aggregate(x, by, FUN, …, simplify = TRUE)
其中,by指定分组变量,必须是list对象。
## S3 method for class ‘formula’
aggregate(formula, data, FUN, …, subset, na.action = na.omit)
比较下面不同:
> aggregate(a[,-1],list(a$id),mean)
Group.1 x1 x2 x3 x4
1 A -0.6610644 0.5104887 -0.48953077 0.1933413
2 B -0.2995482 -0.6844227 -1.00001083 0.4507054
3 C -0.9831631 0.2302809 0.07486895 0.4530656
> aggregate(a,by=list(a$id),mean)
Group.1 id x1 x2 x3 x4
1 A NA -0.6610644 0.5104887 -0.48953077 0.1933413
2 B NA -0.2995482 -0.6844227 -1.00001083 0.4507054
3 C NA -0.9831631 0.2302809 0.07486895 0.4530656
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
上面第二种用法出现了警告,因为用于计算的数据中包含了非数值列,最好像第一种用法,不把idid列带入计算。
下面用公式形式计算:
aggregate(cbind(x1,x2,x3,x4)~id,data=a,mean)
id x1 x2 x3 x4
1 A -0.6610644 0.5104887 -0.48953077 0.1933413
2 B -0.2995482 -0.6844227 -1.00001083 0.4507054
3 C -0.9831631 0.2302809 0.07486895 0.4530656
formula中左侧是分析变量,右侧是分组变量。如果指定多个分组变量,用id1+id2的形式 。
aggregate(weight ~ feed, data = chickwts, mean)
# formula左侧是需要分析的数值变量,右侧是分组变量
feed weight
1 casein 323.5833
2 horsebean 160.2000
3 linseed 218.7500
4 meatmeal 276.9091
5 soybean 246.4286
6 sunflower 328.9167
aggregate帮助文档的例子:
example with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )by1 <- c(“red”, “blue”, 1, 2, NA, “big”, 1, 2, “red”, 1, NA, 12)
by2 <- c(“wet”, “dry”, 99, 95, NA, “damp”, 95, 99, “red”, 99, NA, NA)
aggregate(x = testDF, by = list(by1, by2), FUN = “mean”)
Group.1 Group.2 v1 v2
1 1 95 5 55
2 2 95 7 77
3 1 99 5 55
4 2 99 NA NA
5 big damp 3 33
6 blue dry 3 33
7 red red 4 44
8 red wet 1 11
以上结果中把分组变量是NA的自动删除了
如果要把”NA”作为分组变量,
fby1 <- factor(by1, exclude = “”)
fby2 <- factor(by2, exclude = “”)
aggregate(x = testDF, by = list(fby1, fby2), FUN = “mean”)
Group.1 Group.2 v1 v2
1 1 95 5.0 55.0
2 2 95 7.0 77.0
3 1 99 5.0 55.0
4 2 99 NA NA
5 big damp 3.0 33.0
6 blue dry 3.0 33.0
7 red red 4.0 44.0
8 red wet 1.0 11.0
9 12 NA 9.0 99.0
10 NA NA 7.5 82.5