R Programming - Loop Functions

最新推荐文章于 2024-07-25 22:38:07 发布

skyCeleste.x

最新推荐文章于 2024-07-25 22:38:07 发布

阅读量79

点赞数

文章标签： r语言开发语言

本文链接：https://blog.csdn.net/jeonghin/article/details/124817577

版权

Loop Functions

1. Iapply & sapply
- lapply
- sapply
2. apply
- col/row sums and means
3. mapply
- Vectorizing a Function
- Instant Vectorization
4. tapply
5. split
- Splitting a Data Frame
- Splitting on more than one level

Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier.

1. Iapply & sapply

lapply: Loop over a list and evaluate a function on each element
sapply: Same as lapply but try to simplify the result

lapply

lapply takes three arguments: (1) a list x; (2) a function (or the name of a function) FUN; (3) other arguments via its … argument. If x is not a list, it will be coerced to a list using as.list

> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<bytecode: 0x7faa481311a0>
<environment: namespace:base>

lapply always returns a list, regardless of the class of the input

> x <- list(a = 1:5, b = rnorm(10))
> lapply(x,mean)
$a
[1] 3

$b
[1] 0.06460763

> x <- list(a = 1:4, b = rnorm(10), c=rnorm(20,1), d=rnorm(100,5))
> lapply(x, mean)
$a
[1] 2.5

$b
[1] 0.08151065

$c
[1] 0.8754845

$d
[1] 4.895723

> x <- 1:4
> lapply(x, runif)
[[1]]
[1] 0.5393189

[[2]]
[1] 0.7646683 0.9085368

[[3]]
[1] 0.2349948 0.2697491 0.9805410

[[4]]
[1] 0.3462296 0.4752699 0.6940489 0.6286985

> x <- 1:4
> lapply(x, runif, min=0, max=10)
[[1]]
[1] 7.896477

[[2]]
[1] 3.310554 7.511894

[[3]]
[1] 2.697588 5.187521 6.146860

[[4]]
[1] 8.548282 3.073971 9.337631 3.123481

lapply and friends make heavy use of anonymous functions

> x <- list(a=matrix(1:4,2,2), b=matrix(1:6,3,2))
> x
$a
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$b
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

An anonymous function for extracting the first column of each matrix

> lapply(x, function(elt) elt[,1])
$a
[1] 1 2

$b
[1] 1 2 3

sapply

apply will try to simplify the result of lapply if possible

if the result is a list where every element is length 1, then a vector is returned
if the result is a list where every element is a vector of the same length (>1), a matrix is returned
if it can’t figure things out, a list is returned

2. apply

Apply a function over the margins of an array

apply is used to a evaluate a function (often an anonymous one) over the margins of an array

It is most often used to apply a function to the rows or columns of a matrix
It can be used with general arrays, e.g. taking the average of an array of matrices
It is not really faster than writing a loop, but it works in one line!

> str(apply)
function (X, MARGIN, FUN, ..., simplify = TRUE)

X is an array
MARGIN is an integer vector indicating which margins would be “retained”
FUN is a function to be applied
… is for other arguments to be passed to FUN

> x <- matrix(rnorm(200), 20, 10)
> apply(x, 2, mean)
 [1] -0.21539902  0.25480669  0.29069982  0.17461701
 [5]  0.37034020  0.12646704 -0.32566278 -0.22870461
 [9] -0.09823548 -0.25911445
 
> apply(x, 1, sum)
 [1] -7.0689350  8.6579044  5.1903690  3.2852374
 [5]  1.1214267  5.5725971  1.3352220 -1.2947558
 [9]  1.9802187 -0.6288018  4.0929522 -3.6821994
[13]  3.1798243 -6.3139959  3.2065088  1.1054827
[17] -3.2508571 -4.7663893 -7.3932119 -2.5323088

col/row sums and means

For sums and means of matrix dimensions, we have some shortcuts

rowSums = apply(x, 1, sum)
rowMeans = apply(x, 1, mean)
colSums = apply(x, 2, sum)
colMeans = apply(x, 2, mean)

The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix

> x <- matrix(rnorm(200), 20, 10)
> apply(x, 1, quantile, probs = c(0.25, 0.75))
          [,1]       [,2]         [,3]       [,4]
25% -1.4089414 -1.0943368 -0.363817672 -0.8012552
75%  0.1710051  0.4501446  0.007953445  0.5578421
          [,5]       [,6]       [,7]       [,8]
25% -0.3777239 -0.3780244 -0.2093595 -0.2881016
75%  0.7154293  1.1488982  0.6749194  0.2354858
         [,9]      [,10]      [,11]      [,12]
25% -1.082317 0.03608219 -0.7720221 -1.2136858
75%  0.748666 1.17685990  0.3569027  0.7026888
         [,13]      [,14]      [,15]      [,16]
25% -0.4269014 -0.1967092 -0.6222264 -0.9687748
75%  1.2411431  1.1737596  0.4463872  0.3376294
         [,17]      [,18]      [,19]      [,20]
25% -0.7954193 -1.1253970 -0.4542702 -0.4136788
75%  0.9285497  0.4474869  0.9182585  0.8779093

Average matrix in an array

> a <- array(rnorm(2 * 2 * 10), c(2,2,10))
> apply(a, c(1,2), mean)
            [,1]       [,2]
[1,]  0.43270737 -0.5288165
[2,] -0.04156793  0.1450441
> rowMeans(a, dims=2)
            [,1]       [,2]
[1,]  0.43270737 -0.5288165
[2,] -0.04156793  0.1450441

3. mapply

Multivariate version of lapply

mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments

> str(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, 
    USE.NAMES = TRUE)

FUN is a function to apply
… contains arguments to apply over
MoreArgs is a list of other arguments to FUN
SIMPLIFY indicates whether the result should be simplified

The following is tedious to type
list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
Instead we can do

> mapply(rep, 1:4, 4:1)
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

Vectorizing a Function

> noise <- function(n, mean, sd) {
+   rnorm(n, mean, sd)
+ }
> noise(5, 1, 2)
[1]  1.196562 -1.215140  3.092332  3.702626  5.585344
> noise(1:5, 1:5, 2)
[1] 1.852460 5.124322 3.231025 2.935376 4.500935

> mapply(noise, 1:5, 1:5, 2)
[[1]]
[1] 4.008276

[[2]]
[1] 3.300472 2.697185

[[3]]
[1] 0.9169633 2.3585491 2.6048531

[[4]]
[1] 2.0001993 5.1287517 0.3977513 3.4983892

[[5]]
[1] 6.236692 5.622495 6.975805 6.385192 5.001197

Instant Vectorization

> mapply(noise, 1:5, 1:5, 2)
[[1]]
[1] 0.09917791

[[2]]
[1] 2.544472 2.226350

[[3]]
[1] 1.798817 1.783018 4.084838

[[4]]
[1] 2.463081 2.472774 6.615392 9.716881

[[5]]
[1] 6.534139 4.407679 5.400077 4.202349 3.646371

Which is the same as

> list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))
[[1]]
[1] 4.234073

[[2]]
[1] 4.641468 4.531963

[[3]]
[1] 1.039511 4.629856 2.239910

[[4]]
[1]  3.4594122 -0.8233385  1.1815806  4.7479968

[[5]]
[1] 4.727629 6.513389 2.653866 3.381691 6.811396

4. tapply

Apply a function over subsets of a vector
tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply

> str(tapply)
function (X, INDEX, FUN = NULL, ..., default = NA, 
    simplify = TRUE) 

 - x is a vector
 - INDEX is a factor or a list of factors (or else they are coerced to factors)
 - FUN is a function to be applied
 - ... contains other arguments to be passed FUN
 - simplify, should we simplify the result?
 
 take group means
 

```r
> x <- c(rnorm(10), runif(10), rnorm(10,1))
> f <- gl(3,10)
> f
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
[26] 3 3 3 3 3
Levels: 1 2 3
> tapply(x, f, mean)
        1         2         3 
0.1242822 0.4960855 1.6125871

Take group means without simplification

tapply(x, f, mean, simplify = FALSE)
$`1`
[1] 0.1242822

$`2`
[1] 0.4960855

$`3`
[1] 1.612587

Find group ranges

> tapply(x, f, range)
$`1`
[1] -1.537179  2.028251

$`2`
[1] 0.01933642 0.95697515

$`3`
[1] 0.2878385 2.5830202

5. split

split takes a vector or other objects and splits it into groups determined by a factor or list or factors

> str(split)
function (x, f, drop = FALSE, ...)

x is a vector (or list) or data frame
f is a factor (or coerced to one) or a list of factors
drop indicates whether empty factors levels should be dropped

> x <- c(rnorm(10), runif(10), rnorm(10, 1))
> f <- gl(3, 10)
> split(x, f)
$`1`
 [1]  0.5330910  0.2794371  0.5029999  1.5984695
 [5] -1.0672447  0.3206135  0.5849916  0.3912841
 [9] -1.6406344 -1.2607067

$`2`
 [1] 0.058477467 0.004412661 0.955095140 0.696776107
 [5] 0.918116786 0.578598479 0.406501306 0.733634812
 [9] 0.048131931 0.527895519

$`3`
 [1]  0.01931473 -1.31360908  2.39626004  0.73020077
 [5] -0.25639517 -0.20425572  2.16285391 -0.11501219
 [9] -0.71055291 -0.43085714

A common idiom is split followed by an lapply

> lapply(split(x, f), mean)
$`1`
[1] 0.02423008

$`2`
[1] 0.492764

$`3`
[1] 0.2277947

Splitting a Data Frame

> library(datasets)
> head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

> s <- split(airquality, airquality$Month)
> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
$`5`
   Ozone  Solar.R     Wind 
      NA       NA 11.62258 

$`6`
    Ozone   Solar.R      Wind 
       NA 190.16667  10.26667 

$`7`
     Ozone    Solar.R       Wind 
        NA 216.483871   8.941935 

$`8`
   Ozone  Solar.R     Wind 
      NA       NA 8.793548 

$`9`
   Ozone  Solar.R     Wind 
      NA 167.4333  10.1800

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
               5         6          7        8
Ozone         NA        NA         NA       NA
Solar.R       NA 190.16667 216.483871       NA
Wind    11.62258  10.26667   8.941935 8.793548
               9
Ozone         NA
Solar.R 167.4333
Wind     10.1800
> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))
                5         6          7          8
Ozone    23.61538  29.44444  59.115385  59.961538
Solar.R 181.29630 190.16667 216.483871 171.857143
Wind     11.62258  10.26667   8.941935   8.793548
                9
Ozone    31.44828
Solar.R 167.43333
Wind     10.18000

Splitting on more than one level

> x <- rnorm(10)
> fl <- gl(2,5)
> f2 <- gl(5,2)
> f1 <- gl(2,5)
> f1
 [1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2
> f2
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> interaction(f1, f2)
 [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4 1.5 2.5

Interactions can create empty levels

> str(split(x, list(f1,f2)))
List of 10
 $ 1.1: num [1:2] -0.3 -1.55
 $ 2.1: num(0) 
 $ 1.2: num [1:2] -0.5569 0.0888
 $ 2.2: num(0) 
 $ 1.3: num 0.426
 $ 2.3: num -0.329
 $ 1.4: num(0) 
 $ 2.4: num [1:2] -1.1 2.15
 $ 1.5: num(0) 
 $ 2.5: num [1:2] 0.807 -1.418

Empty levels can be dropped

> str(split(x, list(f1, f2), drop = TRUE))
List of 6
 $ 1.1: num [1:2] -0.3 -1.55
 $ 1.2: num [1:2] -0.5569 0.0888
 $ 1.3: num 0.426
 $ 2.3: num -0.329
 $ 2.4: num [1:2] -1.1 2.15
 $ 2.5: num [1:2] 0.807 -1.418

skyCeleste.x

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
R Programming - Loop Functions

Loop Functions1. Iapply & sapplylapplysapply2. applycol/row sums and means3. mapplyVectorizing a FunctionInstant Vectorization4. tapply5. splitSplitting a Data FrameSplitting on more than one levelWriting for, while loops is useful when programming b
复制链接

扫一扫