机器学习之线性回归之推导
假设数据集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x i , y i ) , ⋯ , ( x N , y N ) } D=\lbrace (x_1, y_1),(x_2,y_2),\cdots,(x_i,y_i),\cdots,(x_N, y_N)\rbrace D={(x1,y1),(x2,y2),⋯,(xi,yi),⋯,(xN,yN)},其中 x i ϵ R p , y i ϵ R , i = 1 , 2 , ⋯ , N x_i\epsilon \mathbb{R}^p,y_i \epsilon\mathbb{R},i=1,2,\cdots,N xiϵRp,yiϵR,i=1,2,⋯,N
X = ( x 1 x 2 ⋯ x N ) T = ( x 1 T x 2 T ⋮ x N T ) = ( x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋮ ⋮ x N 1 x N 2 ⋯ x N p ) N × p X=\begin{pmatrix}x_1 & x_2&\cdots&x_N \end{pmatrix}^T=\begin{pmatrix} x_1^T \\ x_2^T \\ \vdots \\ x_N^T\end{pmatrix}=\begin{pmatrix}x_{11} &x_{12}&\cdots&x_{1p} \\ x_{21} &x_{22}&\cdots&x _{2p} \\ \vdots & \vdots & \vdots & \vdots \\ x_{N1} &x_{N2}&\cdots&x_{Np} \end{pmatrix}_{N\times p} X=(x1x2⋯xN)T=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x11x21⋮xN1x12x22⋮xN2⋯⋯⋮⋯x1px2p⋮xNp⎠⎟⎟⎟⎞N×p
w = ( w 1 w 2 ⋯ w p ) T w = \begin{pmatrix} w_1&w_2 & \cdots & w_p\end{pmatrix}^T w=(w1w2⋯wp)T
Y = ( y 1 y 2 ⋯ y n ) T Y={\begin{pmatrix} y_1&y_2&\cdots&y_n\end{pmatrix}}^T Y=(y1y2⋯yn)T
最小二乘估计
L
(
w
)
=
∑
i
=
1
N
∣
∣
w
T
x
i
−
y
i
∣
∣
2
=
∑
i
=
1
N
(
w
T
x
i
−
y
i
)
2
=
(
w
T
x
1
−
y
1
w
T
x
2
−
y
2
⋯
w
T
x
N
−
y
N
)
(
w
T
x
1
−
y
1
w
T
x
2
−
y
2
⋮
w
T
x
N
−
y
N
)
=
(
w
T
(
x
1
x
2
⋯
x
N
)
−
(
y
1
y
2
⋯
y
N
)
)
(
w
T
(
x
1
x
2
⋯
x
N
)
−
(
y
1
y
2
⋯
y
N
)
)
T
=
(
w
T
X
T
−
Y
T
)
(
X
w
−
Y
)
=
w
T
X
T
X
w
−
w
T
X
T
Y
−
Y
T
X
w
+
Y
T
Y
=
w
T
X
T
X
w
−
2
w
T
X
T
Y
+
Y
T
Y
\begin{aligned} L(w)&=\sum\limits_{i=1}^N||w^Tx_i-y_i||^2\\ &= \sum\limits_{i=1}^N \begin{pmatrix} w^Tx_i-y_i\end{pmatrix}^2 \\ &={\begin{pmatrix} w^Tx_1-y_1&w^Tx_2-y_2 &\cdots&w^Tx_N-y_N\end{pmatrix}} {\begin{pmatrix} w^Tx_1-y_1 \\ w^Tx_2-y_2 \\ \vdots \\ w^Tx_N-y_N\end{pmatrix}} \\ &={\begin{pmatrix}w^T{\begin{pmatrix}x_1&x_2&\cdots&x_N \end{pmatrix} - {\begin{pmatrix} y_1&y_2&\cdots&y_N\end{pmatrix}}} \end{pmatrix}} {\begin{pmatrix}w^T{\begin{pmatrix}x_1&x_2&\cdots&x_N \end{pmatrix} - {\begin{pmatrix} y_1&y_2&\cdots&y_N\end{pmatrix}}} \end{pmatrix}}^T\\ &={\begin{pmatrix}w^TX^T-Y^T\end{pmatrix}}{\begin{pmatrix}Xw-Y\end{pmatrix}} \\ &=w^TX^TXw -w^TX^TY-Y^TXw+Y^TY \\ &=w^TX^TXw-2w^TX^TY+Y^TY \end{aligned}
L(w)=i=1∑N∣∣wTxi−yi∣∣2=i=1∑N(wTxi−yi)2=(wTx1−y1wTx2−y2⋯wTxN−yN)⎝⎜⎜⎜⎛wTx1−y1wTx2−y2⋮wTxN−yN⎠⎟⎟⎟⎞=(wT(x1x2⋯xN)−(y1y2⋯yN))(wT(x1x2⋯xN)−(y1y2⋯yN))T=(wTXT−YT)(Xw−Y)=wTXTXw−wTXTY−YTXw+YTY=wTXTXw−2wTXTY+YTY
对
L
(
w
)
L(w)
L(w)求导
∂
L
(
w
)
∂
w
=
2
X
T
X
w
−
2
X
T
Y
=
2
(
X
T
X
w
−
X
T
Y
)
\begin{aligned} \frac{\partial{L(w)}}{\partial{w}}&=2X^TXw-2X^TY \\ &=2(X^TXw-X^TY) \end{aligned}
∂w∂L(w)=2XTXw−2XTY=2(XTXw−XTY)
令导数为0
2
(
X
T
X
w
−
X
T
Y
)
=
0
2(X^TXw-X^TY) = 0
2(XTXw−XTY)=0
解得
w
w
w:
w
=
(
X
T
X
)
−
1
X
T
Y
w = (X^TX)^{-1}X^TY
w=(XTX)−1XTY