MODULE 11.1: CENTRAL LIMIT THEOREM AND STANDARD ERROR
sampling error of the mean = sample mean − population mean = .
sampling distribution 抽样分布
分层抽样 Stratified sampling
time-series and cross-sectional data.
Time-series and cross-sectional data can be pooled in the same data set. Longitudinal data are observations over time of multiple characteristics of the same entity, such as unemployment, inflation, and GDP growth rates for a country over 10 years. Panel data contain observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over the most recent 24 quarters. Panel and longitudinal data are typically presented in table or spreadsheet form.
central limit theorem
- If the sample size n is sufficiently large (n ≥ 30), the sampling distribution of the sample means will be approximately normal.
- The mean of the population, µ, and the mean of the distribution of all possible sample means are equal.
- The variance of the distribution of sample means is
the population variance divided by the sample size.
MODULE 11.2: CONFIDENCE INTERVALS AND T-DISTRIBUTION
Point estimates are single (sample) values used to estimate population parameters.
Student’s t-distribution
- It is symmetrical.
- It is defined by a single parameter, the degrees of freedom (df), where the degrees of freedom are equal to the number of sample observations minus 1, n − 1, for sample means.
- It has more probability in the tails (“fatter tails”) than the normal distribution.
- As the degrees of freedom (the sample size) gets larger, the shape of the tdistribution more closely approaches a standard normal distribution.
The degrees of freedom for tests based on sample means are n − 1 because, given the mean, only n − 1 observations can be unique.
Practically speaking, the greater the degrees of freedom, the greater the percentage of observations near the center of the distribution and the lower the percentage of observations in the tails, which are thinner as degrees of freedom increase. This means that confidence intervals for a random variable that follows a t-distribution must be wider (narrower) when the degrees of freedom are less (more) for a given significance level.
Confidence interval estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1 − α. Here, alpha, α, is called the level of significancea显著性水平 for the confidence interval, and the probability 1 − α is referred to as the degree of confidence置信度.
confidence intervals point estimate ± (reliability factor × standard error)
If the population has a normal distribution with a known variance, a confidence interval for the population mean can be calculated as:
= point estimate of the population mean (sample mean).
= reliability factor, a standard normal random variable for which the probability in the right-hand tail of the distribution is α/2. In other words, this is the z-score that leaves α/2 of probability in the upper tail.
= the standard error of the sample mean where σ is the known standard deviation of the population, and n is the sample size.
Confidence Intervals for the Population Mean: Normal With Unknown Variance
If the distribution of the population is normal with unknown variance, we can use the tdistribution to construct a confidence interval: