Introduction to Statistics in R: 03-More Distributions and the Central Limit Theorem

More Distributions and the Central Limit Theorem

The normal distribution

What is the normal distribution? 正态分布

Symmetrical 对称的

Area = 1

Curve never hits 0

Described by mean and standard deviation

  • Mean: 20

  • Standard deviation: 3

  • Standard normal distribution

  • Mean: 0

  • Standard deviation: 1

Areas under the normal distribution

68% falls within 1 standard deviation

95% falls within 2 standard deviations

99.7% falls within 3 standard deviations

Lots of histograms look normal

Normal distribution

Women's heights from NHANES

Mean: 161cm Standard deviation: 7cm

Approximating data with the normal distribution

What percent of women are shorter than 154 cm?

pnorm(154, mean = 161, sd = 7)  #0.159

16% of women in the survey are shorter than 154 cm

What percent of women are taller than 154 cm?

pnorm(154, mean = 161, sd = 7, lower.tail = FALSE) # 0.8413447

What percent of women are 154-157cm?

pnorm(157, mean = 161, sd = 7) - pnorm(154, mean = 161, sd = 7) # 0.1252

What height are 90% of women shorter than?

qnorm(0.9, mean = 161, sd = 7) # 169.9709

What height are 90% of women taller than?

qnorm(0.9, mean = 161, sd = 7, lower.tail = FALSE) # 152.03

Generating random numbers

# Generate 10 random heights
rnorm(10, mean = 161, sd = 7)

The central limit theorem 中心极限定理

Rolling the dice 5 times

die <- c(1, 2, 3, 4, 5, 6)
# Roll 5 times
smple_of_5 <- sample(die, 5, replace=TRUE)
sample_of_5
# 1 3 4 1 1 
mean(sample_of_5)
# 2.0

# Roll 5 times and take mean
sample(die, 5, replace=TRUE) %>% mean()
# 4.4
sample(die, 5, replace=TRUE) %>% mean()
# 3.8

Rolling the dice 5 times 10 times

Repeat 10 times:

  • Roll 5 times

  • Take the mean

sample_means <- replicate(10, sample(die, replace=TRUE) %>% mean())
sample_means
# 3.8, 4.0, 3.8, 3.6, 3.2, 4.8, 2.6, 3.0, 2.6, 2.0

Sampling distributions

Sampling distribution of the sample mean

100 sample means

replicate(100, sample(die, 5, replace=TRUE) %>% mean())
# 2.8 3.2 1.8 4.6 4.0 2.8 4.4 2.4 3.4 2.8 4.2 3.4...

1000 sample means

sample_means <- replicate(1000, sample(die, 5, replace=TRUE) %>% mean())

Central limit theorem

The sampling distribution of a statistic becomes closer to the normal distribution as the number of trials increases.

  • Samples should be random and independent

Standard deviation and the CLT

replicate(1000, sample(die, 5, replace=TRUE)%>%sd())

Proportions and the CLT

sales_team <- c("Amir", "Brian", "Claire", "Damian")
sample(sales_team, 10, replace=TRUE)

Sampling distribution of proportion

Mean of sampling distribution

# Estimate expected value of die
mean(sample_mean)
# 3.48
# Estimate proportion of "Claire"s
mean(sample_props)
# 0.26
  • Estimate characteristics of unknown underlying distribution

  • More easily estimate characteristics of large populations

The Poisson distribution 泊松分布

Poisson processes

  • Events appear to happen at a certain rate, but completely at random

  • Examples

    • Number of animals adopted from an animal shelter per week

    • Number of people arriving at a restaurant per hour

    • Number of earthquakes in California per year

Poisson distribution

  • Probability of some # of events occuring over a fixed period of time

  • Examples

    • Probability of >= 5 animals adopted from an animal shelter per week

    • Probability of 12 people arriving at a restaurant per hour

    • Probability of < 20 earthquakes in California per year

Lambda(λ)

  • λ = avarage number of events per time interval

    • Average number of adoptions per week = 8

Lambda is the distribution's peak

Probability of a single value

If the average number of adoptions per week is 8, what is P(# adoptions in a week = 5)?

dpois(5, lambda = 8)
# 0.09160366

Probability of less than or equal to

If the average number of adoptions per week is 8, what is P(# adoptions in a week <= 5)?

ppois(5, lambda = 8)
# 0.1912361

Probability of greater than

ppois(5, lambda = 8, lower.tail = FALSE)
# 0.8087639

If the average number of adoptions per week is 10, what is P(# adoptions in a week > 5)?

ppois(5, lambda = 10, lower.tail = FALSE)
# 0.932914

Sampling from a Poisson distribution

rpois(10, lambda = 8)
# 13 6 11 7 10 8 7 3 7 6

The CLT still applies!

More probability distributions

Exponential distribution

  • Probability of time between Poisson events

  • Examples

    • Probability of > 1 day between adoptions

    • Probability of < 10 minutes between restaurant arrivals

    • Probability of 6-8 months between earthquakes

  • Also uses lambda(rate)

  • Continuous(time)

Customer service requests

  • On average, one customer service ticket is created every 2 minutes

    • λ = 0.5 customer service tickets created each minute

Lambda in exponential distribution

How long until a new request is created?

P(wait < 1 min) =

pexp(1, rate = 0.5)
# 0.3934693

P(wait > 4 min) =

pexp(4, rate = 0.5, lower.tail = FALSE)
# 0.1353353

P(1 min < wait < 4 min) =

pexp(4, rate = 0.5) - pexp(1, rate = 0.5)
# 0.4711954

Expected value of exponential distribution

In terms of rate(Poisson):

  • λ = 0.5 requests per minute

Interms of time(exponential):

  • 1/λ = 1 request per 2 minutes

(Student's)t-distribution

  • Similar shape as the normal distribution

Degrees of freedom

  • Has parameter degrees of freedom(df) which affects the thickness of the tails

    • Lower df = thicker tails, higher standard deviation

    • Higher df = closer to normal distribution

Log-normal distribution

  • Variable whose logarithm is normally distributed

  • Examples:

    • Length of chess games

    • Adult blood pressure

    • Number of hospitalizations in the 2003 SARS outbreak

  • 25
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Make sure that we grade your HW based solely on your R code script. If we don’t see the correct results when we run your code, you will get 0 point for those questions. 1. Create a R function to show the central limit theorem. This function should have the following properties: - In the argument of the function, you have an option to consider poisson, exponential, uniform, normal distributions as the population distribution. - Depending on the choice of the population distribution in part (1), the function will receive extra argument(s) for the parameters of the distribution. For example, if a normal distri- bution is chosen, the mean and SD are needed in the function argument. Note that each distribution has a different parameter setting. - If the distribution is not selected from (“Normal”, “Poisson”, “Uniform”, “Exponential”), the function needs to print the following error message: check the distributional setting: consider ("Normal", "Poisson", "Uniform", "Exponential") and stop. - The function should give the summary statistics (minimum, 1st quartile, median, mean, 3rd quartile, maximum) of 1, 000 sample mean values for given n values (n = 10, 50, 100, 500). - The result should have the following statement at the beginning, for example, if a normal distribution with mean 1 and SD 0.5 was chosen: ‘‘For the Normal distribution, the central limit theorem is tested’’ where the term “Normal” is automatically inserted in the statement based on the argument. And the output should have the following form: For the Normal distribution, the central limit theorem is tested When n=10: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5187 0.8930 1.0016 0.9993 1.1019 1.4532 When n=50: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.7964 0.9508 1.0010 0.9997 1.0493 1.2309 1 When n=100: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8534 0.9679 0.9972 0.9992 1.0325 1.1711 When n=500: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.9258 0.9836 1.0006 0.9997 1.0154 1.0678 I Using your own function, test the N(−1,0.52) and the Unif(−3,6) case.
06-05
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值