R some speedup method
使用vector
避免使用for loop
以求均值为例,以下三种方式,按从快到慢排序:
# vector
data$mean <- rowMeans[data[, c(1,2)]]
# apply
data$mean <- apply(data, 1, mean)
# loop
如果用loop的话, 考虑使用set
M = matrix(1,nrow=100000,ncol=100)
DF = as.data.frame(M)
DT = as.data.table(M)
system.time(for (i in 1:1000) DF[i,1L] <- i) # 591.000s
system.time(for (i in 1:1000) DT[i,V1:=i]) # 1.158s
system.time(for (i in 1:1000) M[i,1L] <- i) # 0.016s
system.time(for (i in 1:1000) set(DT,i,1L,i)) # 0.027s
多使用which选择行
system.time({
want = which(rowSums(df) > 4)
output = rep("less than 4", times = nrow(df))
output[want] = "greater than 4"
})
ps. 别忘了which.