R Programming - Loop Functions

Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier.

1. Iapply & sapply

lapply: Loop over a list and evaluate a function on each element
sapply: Same as lapply but try to simplify the result

lapply

lapply takes three arguments: (1) a list x; (2) a function (or the name of a function) FUN; (3) other arguments via its … argument. If x is not a list, it will be coerced to a list using as.list

> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<bytecode: 0x7faa481311a0>
<environment: namespace:base>

lapply always returns a list, regardless of the class of the input

> x <- list(a = 1:5, b = rnorm(10))
> lapply(x,mean)
$a
[1] 3

$b
[1] 0.06460763

> x <- list(a = 1:4, b = rnorm(10), c=rnorm(20,1), d=rnorm(100,5))
> lapply(x, mean)
$a
[1] 2.5

$b
[1] 0.08151065

$c
[1] 0.8754845

$d
[1] 4.895723

> x <- 1:4
> lapply(x, runif)
[[1]]
[1] 0.5393189

[[2]]
[1] 0.7646683 0.9085368

[[3]]
[1] 0.2349948 0.2697491 0.9805410

[[4]]
[1] 0.3462296 0.4752699 0.6940489 0.6286985

> x <- 1:4
> lapply(x, runif, min=0, max=10)
[[1]]
[1] 7.896477

[[2]]
[1] 3.310554 7.511894

[[3]]
[1] 2.697588 5.187521 6.146860

[[4]]
[1] 8.548282 3.073971 9.337631 3.123481

lapply and friends make heavy use of anonymous functions

> x <- list(a=matrix(1:4,2,2), b=matrix(1:6,3,2))
> x
$a
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$b
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

An anonymous function for extracting the first column of each matrix

> lapply(x, function(elt) elt[,1])
$a
[1] 1 2

$b
[1] 1 2 3

sapply

apply will try to simplify the result of lapply if possible

  • if the result is a list where every element is length 1, then a vector is returned
  • if the result is a list where every element is a vector of the same length (>1), a matrix is returned
  • if it can’t figure things out, a list is returned

2. apply

Apply a function over the margins of an array

apply is used to a evaluate a function (often an anonymous one) over the margins of an array

  • It is most often used to apply a function to the rows or columns of a matrix
  • It can be used with general arrays, e.g. taking the average of an array of matrices
  • It is not really faster than writing a loop, but it works in one line!
> str(apply)
function (X, MARGIN, FUN, ..., simplify = TRUE) 
  • X is an array
  • MARGIN is an integer vector indicating which margins would be “retained”
  • FUN is a function to be applied
  • … is for other arguments to be passed to FUN
> x <- matrix(rnorm(200), 20, 10)
> apply(x, 2, mean)
 [1] -0.21539902  0.25480669  0.29069982  0.17461701
 [5]  0.37034020  0.12646704 -0.32566278 -0.22870461
 [9] -0.09823548 -0.25911445
 
> apply(x, 1, sum)
 [1] -7.0689350  8.6579044  5.1903690  3.2852374
 [5]  1.1214267  5.5725971  1.3352220 -1.2947558
 [9]  1.9802187 -0.6288018  4.0929522 -3.6821994
[13]  3.1798243 -6.3139959  3.2065088  1.1054827
[17] -3.2508571 -4.7663893 -7.3932119 -2.5323088

col/row sums and means

For sums and means of matrix dimensions, we have some shortcuts

  • rowSums = apply(x, 1, sum)
  • rowMeans = apply(x, 1, mean)
  • colSums = apply(x, 2, sum)
  • colMeans = apply(x, 2, mean)

The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix

> x <- matrix(rnorm(200), 20, 10)
> apply(x, 1, quantile, probs = c(0.25, 0.75))
          [,1]       [,2]         [,3]       [,4]
25% -1.4089414 -1.0943368 -0.363817672 -0.8012552
75%  0.1710051  0.4501446  0.007953445  0.5578421
          [,5]       [,6]       [,7]       [,8]
25% -0.3777239 -0.3780244 -0.2093595 -0.2881016
75%  0.7154293  1.1488982  0.6749194  0.2354858
         [,9]      [,10]      [,11]      [,12]
25% -1.082317 0.03608219 -0.7720221 -1.2136858
75%  0.748666 1.17685990  0.3569027  0.7026888
         [,13]      [,14]      [,15]      [,16]
25% -0.4269014 -0.1967092 -0.6222264 -0.9687748
75%  1.2411431  1.1737596  0.4463872  0.3376294
         [,17]      [,18]      [,19]      [,20]
25% -0.7954193 -1.1253970 -0.4542702 -0.4136788
75%  0.9285497  0.4474869  0.9182585  0.8779093

Average matrix in an array

> a <- array(rnorm(2 * 2 * 10), c(2,2,10))
> apply(a, c(1,2), mean)
            [,1]       [,2]
[1,]  0.43270737 -0.5288165
[2,] -0.04156793  0.1450441
> rowMeans(a, dims=2)
            [,1]       [,2]
[1,]  0.43270737 -0.5288165
[2,] -0.04156793  0.1450441

3. mapply

Multivariate version of lapply

mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments

> str(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, 
    USE.NAMES = TRUE)  
  • FUN is a function to apply
  • … contains arguments to apply over
  • MoreArgs is a list of other arguments to FUN
  • SIMPLIFY indicates whether the result should be simplified

The following is tedious to type
list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
Instead we can do

> mapply(rep, 1:4, 4:1)
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

Vectorizing a Function

> noise <- function(n, mean, sd) {
+   rnorm(n, mean, sd)
+ }
> noise(5, 1, 2)
[1]  1.196562 -1.215140  3.092332  3.702626  5.585344
> noise(1:5, 1:5, 2)
[1] 1.852460 5.124322 3.231025 2.935376 4.500935
> mapply(noise, 1:5, 1:5, 2)
[[1]]
[1] 4.008276

[[2]]
[1] 3.300472 2.697185

[[3]]
[1] 0.9169633 2.3585491 2.6048531

[[4]]
[1] 2.0001993 5.1287517 0.3977513 3.4983892

[[5]]
[1] 6.236692 5.622495 6.975805 6.385192 5.001197

Instant Vectorization

> mapply(noise, 1:5, 1:5, 2)
[[1]]
[1] 0.09917791

[[2]]
[1] 2.544472 2.226350

[[3]]
[1] 1.798817 1.783018 4.084838

[[4]]
[1] 2.463081 2.472774 6.615392 9.716881

[[5]]
[1] 6.534139 4.407679 5.400077 4.202349 3.646371

Which is the same as

> list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))
[[1]]
[1] 4.234073

[[2]]
[1] 4.641468 4.531963

[[3]]
[1] 1.039511 4.629856 2.239910

[[4]]
[1]  3.4594122 -0.8233385  1.1815806  4.7479968

[[5]]
[1] 4.727629 6.513389 2.653866 3.381691 6.811396

4. tapply

Apply a function over subsets of a vector
tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply

> str(tapply)
function (X, INDEX, FUN = NULL, ..., default = NA, 
    simplify = TRUE) 

 - x is a vector
 - INDEX is a factor or a list of factors (or else they are coerced to factors)
 - FUN is a function to be applied
 - ... contains other arguments to be passed FUN
 - simplify, should we simplify the result?
 
 take group means
 

```r
> x <- c(rnorm(10), runif(10), rnorm(10,1))
> f <- gl(3,10)
> f
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
[26] 3 3 3 3 3
Levels: 1 2 3
> tapply(x, f, mean)
        1         2         3 
0.1242822 0.4960855 1.6125871 

Take group means without simplification

tapply(x, f, mean, simplify = FALSE)
$`1`
[1] 0.1242822

$`2`
[1] 0.4960855

$`3`
[1] 1.612587

Find group ranges

> tapply(x, f, range)
$`1`
[1] -1.537179  2.028251

$`2`
[1] 0.01933642 0.95697515

$`3`
[1] 0.2878385 2.5830202

5. split

split takes a vector or other objects and splits it into groups determined by a factor or list or factors

> str(split)
function (x, f, drop = FALSE, ...)  
  • x is a vector (or list) or data frame
  • f is a factor (or coerced to one) or a list of factors
  • drop indicates whether empty factors levels should be dropped
> x <- c(rnorm(10), runif(10), rnorm(10, 1))
> f <- gl(3, 10)
> split(x, f)
$`1`
 [1]  0.5330910  0.2794371  0.5029999  1.5984695
 [5] -1.0672447  0.3206135  0.5849916  0.3912841
 [9] -1.6406344 -1.2607067

$`2`
 [1] 0.058477467 0.004412661 0.955095140 0.696776107
 [5] 0.918116786 0.578598479 0.406501306 0.733634812
 [9] 0.048131931 0.527895519

$`3`
 [1]  0.01931473 -1.31360908  2.39626004  0.73020077
 [5] -0.25639517 -0.20425572  2.16285391 -0.11501219
 [9] -0.71055291 -0.43085714

A common idiom is split followed by an lapply

> lapply(split(x, f), mean)
$`1`
[1] 0.02423008

$`2`
[1] 0.492764

$`3`
[1] 0.2277947

Splitting a Data Frame

> library(datasets)
> head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
> s <- split(airquality, airquality$Month)
> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
$`5`
   Ozone  Solar.R     Wind 
      NA       NA 11.62258 

$`6`
    Ozone   Solar.R      Wind 
       NA 190.16667  10.26667 

$`7`
     Ozone    Solar.R       Wind 
        NA 216.483871   8.941935 

$`8`
   Ozone  Solar.R     Wind 
      NA       NA 8.793548 

$`9`
   Ozone  Solar.R     Wind 
      NA 167.4333  10.1800 
> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
               5         6          7        8
Ozone         NA        NA         NA       NA
Solar.R       NA 190.16667 216.483871       NA
Wind    11.62258  10.26667   8.941935 8.793548
               9
Ozone         NA
Solar.R 167.4333
Wind     10.1800
> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))
                5         6          7          8
Ozone    23.61538  29.44444  59.115385  59.961538
Solar.R 181.29630 190.16667 216.483871 171.857143
Wind     11.62258  10.26667   8.941935   8.793548
                9
Ozone    31.44828
Solar.R 167.43333
Wind     10.18000

Splitting on more than one level

> x <- rnorm(10)
> fl <- gl(2,5)
> f2 <- gl(5,2)
> f1 <- gl(2,5)
> f1
 [1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2
> f2
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> interaction(f1, f2)
 [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4 1.5 2.5

Interactions can create empty levels

> str(split(x, list(f1,f2)))
List of 10
 $ 1.1: num [1:2] -0.3 -1.55
 $ 2.1: num(0) 
 $ 1.2: num [1:2] -0.5569 0.0888
 $ 2.2: num(0) 
 $ 1.3: num 0.426
 $ 2.3: num -0.329
 $ 1.4: num(0) 
 $ 2.4: num [1:2] -1.1 2.15
 $ 1.5: num(0) 
 $ 2.5: num [1:2] 0.807 -1.418

Empty levels can be dropped

> str(split(x, list(f1, f2), drop = TRUE))
List of 6
 $ 1.1: num [1:2] -0.3 -1.55
 $ 1.2: num [1:2] -0.5569 0.0888
 $ 1.3: num 0.426
 $ 2.3: num -0.329
 $ 2.4: num [1:2] -1.1 2.15
 $ 2.5: num [1:2] 0.807 -1.418
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值