time-series data vs cross-sectional data:
- time-series data: observations taken over a period of time at a specific spaced time intervals 在一个时间段内以某个时间间隔划分得到的观测值
- cross-sectional data: observations taken at a single point in time 在一个时间点下的多个观测值
longitudinal data vs panel data:
- longitudinal data: several features for one object duirng a time series 一个时间序列下同一观测实体的多个特征
- panel data: one feature for several objects during a time series 一个时间序列下多个观测实体的一个特征
central limit theorem
population with distribution (
μ
,
σ
2
\mu, \sigma^2
μ,σ2), then the mean
x
ˉ
\bar{x}
xˉ of the samples of size
n
n
n from the population has the distribution of (
μ
,
σ
2
n
\mu, \frac{\sigma^2}{n}
μ,nσ2) as the smaple size becomes large (sufficiently larg
n
≥
30
n\ge30
n≥30).
对于一个总体,其分布为(
μ
,
σ
2
\mu, \sigma^2
μ,σ2),那么从中采样,样本数为
n
n
n,随着样本数变多(
n
≥
30
n\ge30
n≥30),样本的平均数
x
ˉ
\bar{x}
xˉ满足分布(
μ
,
σ
2
n
\mu, \frac{\sigma^2}{n}
μ,nσ2)。
standard error of the sample mean = σ n \frac{\sigma}{\sqrt{n}} nσ
degree of freedom in sample with size n: n-1
样本数量为n的样本,自由度为n-1,因为对于某一个分布,最多只有n-1个样本可以自由取值,剩余1个样本的值由那n-1个样本决定。常见例子为,已知3个数字a, b, c的平均值为4,最多只有2个数字可以自由取值,最后一个数字的值将由这2个数字决定 (12-另外2个数字之和)。
Usage of z-statistic or t-statistic
null hypothesis vs alternative hypothesis
- null hypothesis: h 0 h_0 h0, want to reject
- alternative hypothesis: H a H_a Ha, want to conclude
对于 h 0 h_0 h0, 可能为 μ = μ 0 \mu=\mu_0 μ=μ0或者 μ ≥ μ 0 \mu\ge\mu_0 μ≥μ0或者 μ ≤ μ 0 \mu\le\mu_0 μ≤μ0,总是包含等号
one-tailed test vs two-tailed test
- one-tailed test: x > 0 x>0 x>0等单向条件
- two-tailed test: x ≠ 0 x\ne0 x=0等双向条件
对于z-distributed test statistic
- 拒绝 H 0 H_0 H0假设当z-statistic值不在z-value范围内
- 无法拒绝 H 0 H_0 H0假设当z-statistic值在z-value范围内