Q1: Consider the linear model
y i = β 0 + β 1 x i + ϵ i , ϵ i ∼ i i d N ( 0 , σ 2 ) , i = 1 , … , n . y_i=\beta_0+\beta_1x_i+\epsilon_i,\ \epsilon_i\stackrel{iid}{\sim} N(0,\sigma^2), i=1,\dots,n. yi=β0+β1xi+ϵi, ϵi∼iidN(0,σ2),i=1,…,n.
-
Derive the maximum likelihood estimators (MLE) for β 0 , β 1 \beta_0,\beta_1 β0,β1. Are they consistent with the least square estimators (LSE)?
-
Derive the MLE for σ 2 \sigma^2 σ2 and look at its unbiasedness.
-
A very slippery point is whether to treat the x i x_i xi as fixed numbers or as random variables. In the class, we treated the predictors x i x_i xi as fixed numbers for sake of convenience. Now suppose that the predictors x i x_i xi are iid random variables (independent of ϵ i \epsilon_i ϵi) with density f X ( x ; θ ) f_X(x;\theta) fX(x;θ) for some parameter θ \theta θ. Write down the likelihood function for all of our data ( x i , y i ) , i = 1 , … , n (x_i,y_i),i=1,\dots,n (xi,yi),i=1,…,n. Derive the MLE for β 0 , β 1 \beta_0,\beta_1 β0,β1 and see whether the MLE changes if we work with the setting of random predictors?
解: 注意到 y i ∼ N ( β 0 + β 1 x i , σ 2 ) y_i\sim N(\beta_0+\beta_1x_i,\sigma^2) yi∼N(β0+β1xi,σ2)是独立的,则似然函数为 L ( β 0 , β 1 , σ 2 ) = ∏ i = 1 n 1 2 π σ e − ( y i − β 0 − β 1 x i ) 2 2 σ 2 = ( 2 π σ 2 ) − n / 2 e − Q ( β 0 , β 1 ) 2 σ 2 L(\beta_0,\beta_1,\sigma^2)=\displaystyle\prod_{i=1}^n\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(y_i-\beta_0-\beta_1x_i)^2}{2\sigma^2}}= (2\pi\sigma^2)^{-n/2}e^{-\frac{Q(\beta_0,\beta_1)}{2\sigma^2}} L(β0,β1,σ2)=i=1∏n2πσ1e−2σ2(yi−β0−β1xi)2=(2πσ2)−n/2e−2σ2Q(β0,β1)其中 Q ( β 0 , β 1 ) = ∑ i = 1 n ( y i − β 0 − β 1 x i ) 2 Q(\beta_0,\beta_1)=\sum_{i=1}^n(y_i-\beta_0-\beta_1x_i)^2 Q(β0,β1)=∑i=1n(yi−β0−β1xi)2,对于给定的 σ 2 \sigma^2 σ2,为了使 L ( β 0 , β 1 , σ 2 ) L(\beta_0,\beta_1,\sigma^2) L(β0,β1,σ2)最大化,则需要使 Q ( β 0 , β 1 ) Q(\beta_0,\beta_1) Q(β0,β1)最小化,可知这与最小二乘估计量的估计方法一致,因此,最大似然估计值与最小二乘估计量一样。
也就是说
β ^ 1 = ℓ x y ℓ x x = ∑ i = 1 n ( y i − y ˉ ) ( x i − x ˉ ) ∑ i = 1 n ( x i − x ˉ ) 2 , β ^ 0 = y ˉ − β ^ 1 x ˉ \hat\beta_1=\frac{\ell_{xy}}{\ell_{xx}}=\frac{\sum_{i=1}^n(y_i-\bar{y})(x_i-\bar{x})}{\sum_{i=1}^n(x_i-\bar{x})^2},\hat\beta_0=\bar{y}-\hat\beta_1\bar{x} β^1=ℓxxℓxy=∑i=1n(xi−xˉ)2∑i=1n(yi−yˉ)(xi−xˉ),β^0=yˉ−β^1xˉ接下来我们选择 σ 2 \sigma^2 σ2,令似然函数 L ( β ^ 0 , β ^ 1 , σ 2 ) = ( 2 π σ 2 ) − n / 2 e − Q ( β ^ 0 , β ^ 1 ) 2 σ 2 L(\hat\beta_0,\hat\beta_1,\sigma^2)=(2\pi\sigma^2)^{-n/2}e^{-\frac{Q(\hat\beta_0,\hat\beta_1)}{2\sigma^2}} L(β^0,β^1,σ2)=(2πσ2)−n/2e−2σ2Q(β^0,β^1)最大化,容易得到这样的 σ 2 \sigma^2 σ2为 σ ^ M L E 2 = Q ( β ^ 0 , β ^ 1 ) n = S e 2 n \hat\sigma_{MLE}^2=\frac{Q(\hat\beta_0,\hat\beta_1)}{n}=\frac{S_e^2}{n} σ^MLE2=nQ(β^0,β^1)=nSe2我们已经证明了 E [ S e 2 ] = ( n − 2 ) σ 2 E[S_e^2]=(n-2)\sigma^2 E[Se2]=(n−2)σ2,因此 E [ σ ^ M L E 2 ] = n − 2 n σ 2 E[\hat\sigma_{MLE}^2]=\frac{n-2}{n}\sigma^2 E[σ^MLE2]=nn−2σ2,所以这不是 σ 2 \sigma^2 σ2的无偏估计。
如果 x i x_i xi是密度函数为 f X ( x ; θ ) f_X(x;\theta) fX(x;θ)的随机变量,则关于 ( x i , y i ) (x_i,y_i) (xi,yi)的似然函数为 L ( β 0 , β 1 , σ 2 , θ ) = ∏ i = 1 n f X ( x i ; θ ) f ( y i ∣ x i ) = ∏ i = 1 n [ f X ( x i ; θ ) 1 2 π σ e − ( y i − β 0 − β 1 x i ) 2 2 σ 2 ] = ( 2 π σ 2 ) − n / 2 e − Q ( β 0 , β 1 ) 2 σ 2 f X ( x i ; θ ) \begin{aligned} L(\beta_0,\beta_1,\sigma^2,\theta)&=\displaystyle\prod_{i=1}^nf_X(x_i;\theta)f(y_i|x_i)\\ &=\displaystyle\prod_{i=1}^n\Big[f_X(x_i;\theta)\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(y_i-\beta_0-\beta_1x_i)^2}{2\sigma^2}}\Big]\\ &=(2\pi\sigma^2)^{-n/2}e^{-\frac{Q(\beta_0,\beta_1)}{2\sigma^2}}f_X(x_i;\theta) \end{aligned} L(β0,β1,σ2,θ)=i=1∏nfX(xi;θ)f(yi∣xi)=i=1∏n[fX(xi;θ)2πσ1e−2σ2(yi−β0−β1xi)2]=(2πσ2)−n/2e−2σ2Q(β0,β1)fX(xi;θ)对于固定的 σ 2 , θ \sigma^2,\theta σ2,θ,为了最大化 L ( β 0 , β 1 , σ 2 , θ ) L(\beta_0,\beta_1,\sigma^2,\theta) L(β0,β1,σ2,θ),则需要使 Q ( β 0 , β 1 ) Q(\beta_0,\beta_1) Q(β0,β1)最小化,因此最大似然估计值没有发生变化。
Q2: Consider the linear model without intercept
y i = β x i + ϵ i , i = 1 , … , n , y_i = \beta x_i+\epsilon_i,\ i=1,\dots,n, yi=βxi+ϵi, i=1,…,n,
where ϵ i \epsilon_i ϵi are independent with E [ ϵ i ] = 0 E[\epsilon_i]=0 E[ϵi]=0 and V a r [ ϵ i ] = σ 2 Var[\epsilon_i]=\sigma^2 Var[ϵi]=σ2.
-
Write down the least square estimator β ^ \hat \beta β^ for β \beta β, and derive an unbiased estiamtor for σ 2 \sigma^2 σ2.
-
For fixed x 0 x_0 x0, let y ^ 0 = β ^ x 0 \hat{y}_0=\hat\beta x_0 y^0=β^x0. Work out V a r [ y ^ 0 ] Var[\hat{y}_0] Var[y^0].
解: 令 Q ( β ) = ∑ i = 1 n ( y i − β x i ) 2 Q(\beta)=\sum_{i=1}^n(y_i-\beta x_i)^2 Q(β)=∑i=1n(yi−βxi)2,不难发现,最小值点 β ^ \hat{\beta} β^满足
∂ Q ∂ β = − 2 ∑ i = 1 n ( y i − β x i ) x i = 0 \frac{\partial Q}{\partial\beta}=-2\displaystyle\sum_{i=1}^n(y_i-\beta x_i)x_i=0 ∂β∂Q=−2i=1∑n(yi−βxi)xi=0 于是得到最小二乘估计量:
β ^ = ∑ i = 1 n x i y i ∑ i = 1 n x i 2 \hat{\beta}=\frac{\sum_{i=1}^nx_iy_i}{\sum_{i=1}^nx_i^2} β^=∑i=1nxi2∑i=1nxiyi注意到 E [ Q ( β ^ ) ] = E [ ∑ i = 1 n y i 2 + β ^ 2 ∑ i = 1 n x i 2 − 2 β ^ ∑ i = 1 n x i y i ] = ∑ i = 1 n { V a r [ y i ] + ( E [ y i ] ) 2 } − E [ β ^ ∑ i = 1 n x i y i ] = ∑ i = 1 n ( σ 2 + β 2 x i 2