模型
y i = β 0 + β 1 x 1 i + … + β k x k i + ϵ i y_i=\beta_0+\beta_1x_{1i}+\ldots+\beta_kx_{ki}+\epsilon_i yi=β0+β1x1i+…+βkxki+ϵi
从样本的角度推导
假设样本数据集为
{
y
1
,
…
,
y
n
}
\{y_1,\ldots,y_n\}
{y1,…,yn},令
β
=
(
β
0
,
β
1
,
…
,
β
k
)
′
\beta=(\beta_0,\beta_1,\ldots,\beta_k)^\prime
β=(β0,β1,…,βk)′,
y
=
(
y
1
,
…
,
y
n
)
′
y=(y_1,\ldots,y_n)^\prime
y=(y1,…,yn)′,
x
i
=
(
1
,
x
1
i
,
…
,
x
k
i
)
′
x_i=(1,x_{1i},\ldots,x_{ki})^\prime
xi=(1,x1i,…,xki)′,
x
=
(
x
1
,
…
,
x
n
)
x=(x_1,\ldots,x_n)
x=(x1,…,xn),那么:
β
^
=
a
r
g
m
i
n
β
∑
i
=
1
n
1
2
(
y
i
−
x
i
′
β
)
2
=
a
r
g
m
i
n
β
1
2
∣
∣
y
−
x
′
β
∣
∣
2
=
a
r
g
m
i
n
β
1
2
(
y
−
x
′
β
)
′
(
y
−
x
′
β
)
=
a
r
g
m
i
n
β
1
2
(
y
′
y
−
y
′
x
′
β
−
β
′
x
y
+
β
′
x
x
′
β
)
\begin{align*} \hat{\beta}&=\mathop{argmin}\limits_{\beta}\sum\limits_{i=1}^n\dfrac{1}{2}(y_i-x_i^\prime\beta)^2 \\ &=\mathop{argmin}\limits_{\beta}\dfrac{1}{2}||y-x^\prime\beta||^2 \\ &=\mathop{argmin}\limits_{\beta}\dfrac{1}{2}(y-x^\prime\beta)^\prime(y-x^\prime\beta) \\ &=\mathop{argmin}\limits_{\beta}\dfrac{1}{2}(y^\prime y-y^\prime x^\prime\beta-\beta^\prime xy+\beta^\prime xx^\prime\beta) \end{align*}
β^=βargmini=1∑n21(yi−xi′β)2=βargmin21∣∣y−x′β∣∣2=βargmin21(y−x′β)′(y−x′β)=βargmin21(y′y−y′x′β−β′xy+β′xx′β)
对
1
2
(
y
′
y
−
y
′
x
′
β
−
β
′
x
y
+
β
′
x
x
′
β
)
\dfrac{1}{2}(y^\prime y-y^\prime x^\prime\beta-\beta^\prime xy+\beta^\prime xx^\prime\beta)
21(y′y−y′x′β−β′xy+β′xx′β)关于
β
\beta
β求偏导,有:
∂
∂
β
1
2
(
y
′
y
−
y
′
x
′
β
−
β
′
x
y
+
β
′
x
x
′
β
)
=
1
2
(
−
x
y
−
x
y
+
x
x
′
β
+
x
x
′
β
)
=
x
x
′
β
−
x
y
=
0
\begin{align*} \dfrac{\partial}{\partial\beta}\dfrac{1}{2}(y^\prime y-y^\prime x^\prime\beta-\beta^\prime xy+\beta^\prime xx^\prime\beta) &=\dfrac{1}{2}(-xy-xy+xx^\prime\beta+xx^\prime\beta) \\ &=xx^\prime\beta-xy=0 \end{align*}
∂β∂21(y′y−y′x′β−β′xy+β′xx′β)=21(−xy−xy+xx′β+xx′β)=xx′β−xy=0
故可得 β ^ = ( x x ′ ) − 1 x y \hat{\beta}=(xx^\prime)^{-1}xy β^=(xx′)−1xy
从理论的角度推导
β ^ = a r g m i n β M S E = a r g m i n β E ( y i − x i ′ β ) 2 \begin{align*} \hat{\beta}&=\mathop{argmin}\limits_{\beta}MSE \\ &=\mathop{argmin}\limits_{\beta}E(y_i-x_i^\prime\beta)^2 \end{align*} β^=βargminMSE=βargminE(yi−xi′β)2
对
E
(
y
i
−
x
i
′
β
)
2
E(y_i-x_i^\prime\beta)^2
E(yi−xi′β)2关于
β
\beta
β求偏导,则有:
∂
∂
β
E
(
y
i
−
x
i
′
β
)
2
=
2
E
(
y
i
−
x
i
′
β
)
(
−
x
i
)
=
2
E
x
i
x
i
′
β
−
2
E
x
i
y
i
=
0
\begin{align*} \dfrac{\partial}{\partial\beta}E(y_i-x_i^\prime\beta)^2 &=2E(y_i-x_i^\prime\beta)(-x_i) \\ &=2Ex_ix_i^\prime\beta-2Ex_iy_i =0 \end{align*}
∂β∂E(yi−xi′β)2=2E(yi−xi′β)(−xi)=2Exixi′β−2Exiyi=0
故有: β ^ = ( E x i x i ′ ) E x i y i \hat{\beta}=(Ex_ix_i^\prime)Ex_iy_i β^=(Exixi′)Exiyi