[R]2015.11.17 Practice_should sampling be with replacement-CSDN博客

本文链接：https://blog.csdn.net/shinofang/article/details/49888031

1. Commonly used function
1.1 options()
Invoking options() with no arguments returns a list with the current values of the options. Note that not all options listed below are set initially. To access the value of a single option, one should use, e.g., getOption("width") rather than options("width") which is a list of length one.For getOption, the current value set for option x, or NULL if the option is unset.

For uses setting one or more options, a list with the previous values of the options changed (returned invisibly).

> getOption('width')

[1] 63

> options('width')

$width

[1] 63

1.2 apply()
apply(X, MARGIN, FUN, ...)

X an array, including a matrix.

MARGIN a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.

x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) #区别x<-c(x1=3,x2=c(4:1,2:5))
dimnames(x)[[1]] <- letters[1:8]
#注意是[[1]]，因为dimnames(x)返回的是list，[[1]] is the row name,[[2]] is the column name.
#letters[1:8] is a b c d e f g h, because letters is a vector containing 26 letters.
apply(x, 2, mean, trim = .2)

# Notice that there are eight rows,less than ten.

apply(x, 2, sort)

# Sort the columns of a matrix，注意该矩阵第二列被排序了，但其他列的次序并没有随之改变。

2.Commonly Used Distribution
If you would like to know what distributions are available you can do a search using the commands
help.search(“distribution”) and help("distribution")

“d”	returns the height of the probability density function
“p”	returns the cumulative density function
“q”	returns the inverse cumulative density function (quantiles)
“r”	returns randomly generated numbers

2.1 Normal Distribution
2.1.1
dnorm(x, mean = 0, sd = 1, log = FALSE) #密度函数，density；暂时不知道log的用法;返回密度函数在x处的值
> dnorm(0)
[1] 0.3989423
> dnorm(0)*sqrt(2*pi)
[1] 1

2.1.2
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) #分布函数
q is the given number, mean and sd are used to determine the distribution.
If you wish to find the probability that a number is larger than the given number you can use the lower.tail option:
> pnorm(0,lower.tail=FALSE)
[1] 0.5
> pnorm(1,lower.tail=FALSE)
[1] 0.1586553
2.1.3
The next function we look at is qnorm which is the inverse of pnorm. The idea behind qnorm is that you give it a probability, and it returns the number whose cumulative distribution matches the probability

qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
> qnorm(0.5)
[1] 0
2.1.4

rnorm(n, mean = 0, sd = 1) #产生n个服从N（0，1）的随机数
2.1.5Arguments

log, log.plogical; if TRUE, probabilities p are given as log(p).

lower.taillogical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

2.2.T-Distribution
help(TDist)

dt, pt, qt, and rt.
> x <- seq(-20,20,by=.5)
> y <- dt(x,df=10)
> plot(x,y)

2.3 The Binomial Distribution
help(Binomial)
> x <- seq(0,50,by=1)
> y <- dbinom(x,50,0.2)
> plot(x,y)

2.4 The Chi-Squared Distribution
help(Chisquare)
> x <- seq(-20,20,by=.5)
> y <- dchisq(x,df=10)
> plot(x,y)

3. Random samples and permutations
3.1 Random samples

sample(x, size, replace = FALSE, prob = NULL)

replace: Should sampling be with replacement?

prob: A vector of probability weights for obtaining the elements of the vector being sampled.

3.2 Combinations and permutations
In this section, we need to caculate the factorial. There are several ways:
(1) factorial(n)
(2)gamma(n+1)
(3)prod(1:n)

> system.time(replicate(1000000,factorial(170)))

用户系统流逝

2.33 0.00 2.33

> system.time(replicate(1000000,prod(1:170)))

用户系统流逝

2.38 0.02 2.39

> system.time(replicate(1000000,gamma(171)))

用户系统流逝

1.42 0.03 1.45

reference
http://www.cyclismo.org/tutorial/R/probability.html ;

https://cran.r-project.org/manuals.html