Content
- Lecture 1 Intro
- Lecture 2 Estimation
- Lecture 3 Confidence intervals and hypothesis testing
- Lecture 4 Exponential Family
- Lecture 5 Normal Distribution
- Lecture 6 Missing Data
- Lecture 7 Markov Chains
- Lecture 8 Time Series
- Lecture 9 Linear Regression Models
- Lecture 10 Generalized Linear Models
- Lecture 11 Survival Data
- Lecture 12 Nonparametric Regression
- Lecture 13 Generalized Additive Models
Lecture 1 Intro
- LLN (Law of large numbers)
- CLT (Central Limit Theorem)
- CMT (Continuous Mapping Theorem)
- ST (Slutsky’s Theorem)
- Delta Method
Lecture 2 Estimation
Estimation:
- unbiased
- consistent
- Accuracy of an estimator: M S E = V a r ( θ ^ ) + b i a s ( θ , θ ^ ) = V a r ( θ ^ ) + [ E ( θ ^ ) − θ ) ] 2 MSE = Var(\hat{\theta}) + bias(\theta, \hat{\theta}) = Var(\hat{\theta}) + [E(\hat{\theta}) - \theta)]^2 MSE=Var(θ^)+bias(θ,θ^)=Var(θ^)+[E(θ^)−θ)]2
- Relative efficiency
Two estimation methods
Method of Moments
- Theorem
Maximum Likelihood Estimator
- Fisher information
- Theorem (consistent, unbiased)
Optimality in estimation
- Cramer-Rao lower bound
Lecture 3 Confidence intervals and hypothesis testing
Lecture 4 Exponential Family
Lecture 5 Normal Distribution
Lecture 6 Missing Data
Lecture 7 Markov Chains
Lecture 8 Time Series
Measure of dependence
(Auto) Covariance function:
γ
(
s
,
t
)
=
c
o
v
(
Y
t
,
Y
s
)
=
E
[
(
Y
t
−
μ
t
)
(
Y
s
−
μ
s
)
]
\gamma(s, t) = cov(Y_t, Y_s) = E[(Y_t - \mu_t)(Y_s - \mu_s)]
γ(s,t)=cov(Yt,Ys)=E[(Yt−μt)(Ys−μs)]
(Auto) Correlation function:
ρ
(
s
,
t
)
=
c
o
r
(
Y
t
,
Y
s
)
=
γ
(
s
,
t
)
γ
(
s
,
s
)
γ
(
t
,
t
)
\rho(s, t) = cor(Y_t, Y_s) = \frac{\gamma(s, t)}{\sqrt{\gamma(s, s) \gamma(t, t) }}
ρ(s,t)=cor(Yt,Ys)=γ(s,s)γ(t,t)γ(s,t)
Stationary
Definition: For any finite subset, the joint distribution of Y t + s Y_{t+s} Yt+s and Y s Y_s Ys is the same.
Relaxed version: Second order (weak) stationary 二阶平稳性
E
(
Y
t
)
=
μ
E(Y_t) = \mu
E(Yt)=μ
c
o
v
(
Y
s
,
Y
s
+
t
)
,
D
o
e
s
n
o
t
d
e
p
e
n
d
o
n
s
.
cov(Y_s, Y_{s+t}) ,\ Does\ not\ depend\ on\ s.
cov(Ys,Ys+t), Does not depend on s.
ρ t = c o r ( Y 0 , Y t ) \rho_t = cor(Y_0, Y_t) ρt=cor(Y0,Yt)
Write Noise
Definition: A stochastic process { Y t Y_t Yt} is called white noise if its elements are uncorrelated, with mean E ( Y t ) = 0 E(Y_t) = 0 E(Yt)=0 and V a r ( Y t ) = σ 2 Var(Y_t) = \sigma^2 Var(Yt)=σ2
ρ t = 0 \rho_t = 0 ρt=0
Autoregressive models
A
R
(
1
)
AR(1)
AR(1):
Y
t
−
μ
=
α
(
Y
t
−
1
−
μ
)
+
ϵ
t
Y_t - \mu = \alpha (Y_{t-1} - \mu) + \epsilon_t
Yt−μ=α(Yt−1−μ)+ϵt
White noise ϵ t \epsilon_t ϵt is independent of ⋯ Y t − 2 , Y t − 1 \cdots Y_t-2, Y_{t-1} ⋯Yt−2,Yt−1. It is also called innovation as it adds something new to the process. Without innovations, Y t Y_t Yt would just be some scaled version of Y t − 1 Y_{t-1} Yt−1
- V a r ( Y 0 ) = α 2 V a r ( Y t − 1 ) + σ 2 Var(Y_0) = \alpha^2 Var(Y_{t-1}) + \sigma^2 Var(Y0)=α2Var(Yt−1)+σ2 => γ 0 = α 2 γ 0 + σ 2 \gamma_0 = \alpha^2 \gamma_0 + \sigma^2 γ0=α2γ0+σ2
- { Y t Y_t Yt} is stationary <=> | α \alpha α| < 1
- AR(1) is Markov process.
- The only nonzero partial autocorrelation is ρ 1 ′ = α \rho'_1 = \alpha ρ1′=α.
A R ( p ) AR(p) AR(p):
Y t − μ = ∑ j = 1 p α j ( Y t − j − μ ) + ϵ t Y_t - \mu = \sum_{j=1}^p \alpha_j (Y_{t-j} - \mu) + \epsilon_t Yt−μ=j=1∑pαj(Yt−j−μ)+ϵt
Moving average models
M A ( q ) MA(q) MA(q):
Y t − μ = ∑ j = 1 q β j ϵ t − j + ϵ t Y_t - \mu = \sum_{j=1}^q \beta_j \epsilon_{t-j} + \epsilon_t Yt−μ=j=1∑qβjϵt−j+ϵt
- E ( Y t ) = μ E(Y_t) = \mu E(Yt)=μ and V a r ( Y t ) = σ 2 ( 1 + β 1 2 + ⋯ + β q 2 ) Var(Y_t) = \sigma^2 (1 + \beta_1^2 + \dots + \beta_q^2) Var(Yt)=σ2(1+β12+⋯+βq2) for all t.
- This process is stationary and such that ρ t = 0 \rho_t = 0 ρt=0 for t > q.
ARMA models
A
R
M
A
(
P
,
Q
)
ARMA(P, Q)
ARMA(P,Q):
Y
t
−
μ
=
∑
j
=
1
p
α
j
(
Y
t
−
j
−
μ
)
+
∑
j
=
1
q
β
j
ϵ
t
−
j
+
ϵ
t
Y_t - \mu = \sum_{j=1}^p \alpha_j (Y_{t-j} - \mu) + \sum_{j=1}^q \beta_j \epsilon_{t-j} + \epsilon_t
Yt−μ=j=1∑pαj(Yt−j−μ)+j=1∑qβjϵt−j+ϵt
Lecture 9 Linear Regression Models
Statistical model
Y i = α + β 1 X i 1 + β 2 X i 2 + ⋯ + β d X i d + ϵ i , i = 1 , 2 , … d Y_i = \alpha + \beta_1 X_{i1} + \beta_2 X_{i2} + \dots + \beta_d X_{id} + \epsilon_i,\ i = 1, 2, \dots d Yi=α+β1Xi1+β2Xi2+⋯+βdXid+ϵi, i=1,2,…d
- The noise term ϵ i \epsilon_i ϵi is i.i.d. with E ( ϵ i ) = 0 E(\epsilon_i) = 0 E(ϵi)=0, V a r ( ϵ i j ) = σ 2 Var(\epsilon_ij) = \sigma^2 Var(ϵij)=σ2 and independent of X i X_i Xi for j = 1, … d.
- Provided n >= p and X is full rank i.e. rank(X) = 1 + d = p, we have a closed form solution.
β ^ = ( X T X ) − 1 ) X T Y \hat{\beta} = (X^TX)^{-1})X^TY β^=(XTX)−1)XTY -
β
^
\hat{\beta}
β^ is unbiased, since
β ^ = ( X T X ) − 1 ) X T ( X β + ϵ ) \hat{\beta} = (X^TX)^{-1})X^T(X\beta + \epsilon) β^=(XTX)−1)XT(Xβ+ϵ)
β ^ = β + ( X T X ) − 1 X T ϵ \hat{\beta} = \beta + (X^TX)^{-1} X^T \epsilon β^=β+(XTX)−1XTϵ
E ( β ^ ) = β E(\hat{\beta}) = \beta E(β^)=β - Statical properties:
- Consistency: When n --> ∞ \infty ∞, β ^ \hat{\beta} β^ --> β \beta β.
- Asymptotic normality n ( β ^ − β ) \sqrt{n} (\hat{\beta} - \beta) n(β^−β) --> N ( 0 , σ 2 Q − 1 ) N(0, \sigma^2Q^{-1}) N(0,σ2Q−1), Q = E ( X X T ) Q = E(XX^T) Q=E(XXT)
Normal Linear model
Assume that the error ϵ i \epsilon_i ϵi in the model are i.i.d. N ( 0 , σ 2 ) N(0, \sigma^2) N(0,σ2).
- Y i ∣ X i = x i N ( x i T β , σ 2 ) Y_i|X_i = x_i ~ N(x_i^T\beta, \sigma^2) Yi∣Xi=xi N(xiTβ,σ2)
- β ^ \hat{\beta} β^ is also the MLE of β \beta β
- E ( β ^ ) = β E(\hat{\beta}) = \beta E(β^)=β
- V a r ( β ^ ) = E [ ( β + ( X T X ) − 1 X T ϵ − β ) ( β + ( X T X ) − 1 X T ϵ − β ) T ] = σ 2 ( X T X ) − 1 Var(\hat{\beta}) = E[(\beta + (X^TX)^{-1} X^T \epsilon - \beta)(\beta + (X^TX)^{-1} X^T \epsilon - \beta)^T] = \sigma^2(X^TX)^{-1} Var(β^)=E[(β+(XTX)−1XTϵ−β)(β+(XTX)−1XTϵ−β)T]=σ2(XTX)−1
- σ ^ 2 = 1 n − p ∣ ∣ Y − X ∣ ∣ \hat{\sigma}^2 = \frac{1}{n-p} ||Y - X|| σ^2=n−p1∣∣Y−X∣∣