线性回归
1.梯度下降法
h
θ
(
x
)
=
θ
T
x
=
∑
i
=
0
n
(
θ
i
x
i
)
h_{\theta}(x) = \theta^Tx = \sum_{i=0}^n(\theta_ix_i)
hθ(x)=θTx=i=0∑n(θixi)
代价函数
J
(
θ
)
=
1
2
m
∑
j
=
1
m
(
h
θ
(
x
)
(
j
)
−
y
(
j
)
)
2
J(\theta)=\frac{1}{2m}\sum_{j=1}^m(h_{\theta}(x)^{(j)}-y^{(j)})^2
J(θ)=2m1j=1∑m(hθ(x)(j)−y(j))2
梯度下降求解
θ
i
=
θ
i
−
α
1
m
∑
j
=
1
m
(
h
θ
(
x
)
(
j
)
−
y
(
j
)
)
(
x
i
)
(
j
)
\theta_i = \theta_i - \alpha\frac{1}{m}\sum_{j=1}^m(h_\theta(x)^{(j)}-y^{(j)})(x_i)^{(j)}
θi=θi−αm1j=1∑m(hθ(x)(j)−y(j))(xi)(j)
矢量化
设i = 0、1,m = 3
[
θ
0
θ
1
]
=
[
θ
0
θ
1
]
−
α
1
m
∑
j
=
1
m
(
b
(
j
)
)
[
x
0
x
1
]
(
j
)
,
b
(
j
)
=
h
θ
(
x
)
(
j
)
−
y
(
j
)
\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\sum_{j=1}^m(b^{(j)}) \begin{bmatrix}x_0\\x_1\end{bmatrix}^{(j)},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)}
[θ0θ1]=[θ0θ1]−αm1j=1∑m(b(j))[x0x1](j),b(j)=hθ(x)(j)−y(j)
[
θ
0
θ
1
]
=
[
θ
0
θ
1
]
−
α
1
m
[
b
(
1
)
[
x
0
x
1
]
(
1
)
+
b
(
2
)
[
x
0
x
1
]
(
2
)
+
b
(
3
)
[
x
0
x
1
]
(
3
)
]
,
b
(
j
)
=
h
θ
(
x
)
(
j
)
−
y
(
j
)
\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\begin{bmatrix}b^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}+b^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}+b^{(3)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)}
[θ0θ1]=[θ0θ1]−αm1[b(1)[x0x1](1)+b(2)[x0x1](2)+b(3)[x0x1](3)],b(j)=hθ(x)(j)−y(j)
[
θ
0
θ
1
]
=
[
θ
0
θ
1
]
−
α
1
m
[
[
x
0
x
1
]
(
1
)
[
x
0
x
1
]
(
2
)
[
x
0
x
1
]
(
3
)
]
[
b
(
1
)
b
(
2
)
b
(
3
)
]
,
b
(
j
)
=
h
θ
(
x
)
(
j
)
−
y
(
j
)
\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}\begin{bmatrix}b^{(1)}\\b^{(2)}\\b^{(3)}\end{bmatrix},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)}
[θ0θ1]=[θ0θ1]−αm1[[x0x1](1)[x0x1](2)[x0x1](3)]⎣
⎡b(1)b(2)b(3)⎦
⎤,b(j)=hθ(x)(j)−y(j)
其中:
[
b
(
1
)
b
(
2
)
b
(
3
)
]
=
[
h
θ
(
x
)
(
1
)
h
θ
(
x
)
(
2
)
h
θ
(
x
)
(
3
)
]
−
[
y
(
1
)
y
(
2
)
y
(
3
)
]
=
[
[
x
0
x
1
]
(
1
)
[
θ
0
θ
1
]
[
x
0
x
1
]
(
2
)
[
θ
0
θ
1
]
[
x
0
x
1
]
(
3
)
[
θ
0
θ
1
]
]
−
[
y
(
1
)
y
(
2
)
y
(
3
)
]
=
[
[
x
0
x
1
]
(
1
)
[
x
0
x
1
]
(
2
)
[
x
0
x
1
]
(
3
)
]
[
θ
0
θ
1
]
−
[
y
(
1
)
y
(
2
)
y
(
3
)
]
\begin{bmatrix}b^{(1)}\\b^{(2)}\\b^{(3)}\end{bmatrix}=\begin{bmatrix}h_\theta(x)^{(1)}\\h_\theta(x)^{(2)}\\h_\theta(x)^{(3)}\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}=\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}=\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\end{bmatrix}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}
⎣
⎡b(1)b(2)b(3)⎦
⎤=⎣
⎡hθ(x)(1)hθ(x)(2)hθ(x)(3)⎦
⎤−⎣
⎡y(1)y(2)y(3)⎦
⎤=⎣
⎡[x0x1](1)[θ0θ1][x0x1](2)[θ0θ1][x0x1](3)[θ0θ1]⎦
⎤−⎣
⎡y(1)y(2)y(3)⎦
⎤=⎣
⎡[x0x1](1)[x0x1](2)[x0x1](3)⎦
⎤[θ0θ1]−⎣
⎡y(1)y(2)y(3)⎦
⎤
则:
[
θ
0
θ
1
]
=
[
θ
0
θ
1
]
−
α
1
m
[
[
x
0
x
1
]
(
1
)
[
x
0
x
1
]
(
2
)
[
x
0
x
1
]
(
3
)
]
[
[
[
x
0
x
1
]
(
1
)
[
x
0
x
1
]
(
2
)
[
x
0
x
1
]
(
3
)
]
[
θ
0
θ
1
]
−
[
y
(
1
)
y
(
2
)
y
(
3
)
]
]
\color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}} = \color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}}-\alpha\frac{1}{m}\color{blue}{\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}}\begin{bmatrix}\color{blue}{\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\end{bmatrix}}\color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}\end{bmatrix}
[θ0θ1]=[θ0θ1]−αm1[[x0x1](1)[x0x1](2)[x0x1](3)]⎣
⎡⎣
⎡[x0x1](1)[x0x1](2)[x0x1](3)⎦
⎤[θ0θ1]−⎣
⎡y(1)y(2)y(3)⎦
⎤⎦
⎤
X
=
[
[
x
0
x
1
]
(
1
)
[
x
0
x
1
]
(
2
)
[
x
0
x
1
]
(
3
)
]
,
θ
=
[
θ
0
θ
1
]
,
y
=
[
y
(
1
)
y
(
2
)
y
(
3
)
]
\pmb{X}=\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}, \pmb{\theta}=\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix},\pmb{y}=\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}
XX=[[x0x1](1)[x0x1](2)[x0x1](3)],θθ=[θ0θ1],yy=⎣
⎡y(1)y(2)y(3)⎦
⎤
最终计算公式 : θ = θ − α 1 m X ( X T θ − y ) \Large{最终计算公式:\pmb{\theta}=\pmb{\theta}-\alpha\frac{1}{m}\pmb{X}(\pmb{X}^T\pmb{\theta}-\pmb{y})} 最终计算公式:θθ=θθ−αm1XX(XXTθθ−yy)
import numpy as np
import matplotlib.pyplot as plt
mean = (1.5, 1.5)
cov = [[1, 0.95], [0.95, 1]]
XY = np.random.multivariate_normal(mean, cov, 30).T
m = XY.shape[1] # 样本数
# 显示数据
plt.scatter(XY[0,:], XY[1,:], c='b', s=10, edgecolor='none')
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()
# 回归函数 h=theta^T*X=theta0X0+theta1X1
# 要生成 y=kx+b 的回归函数,令其中 X0=1
X = np.concatenate((np.ones((1, m)), XY[0, :][np.newaxis, :]), axis=0)
Y = XY[1, :][np.newaxis, :].T
theta = np.random.random((2, 1)) - 0.5
alpha = 0.1 # 学习速率
epochs = 200 # 迭代次数
for i in range(epochs):
theta = theta - alpha * 1/m * X.dot(np.dot(X.T, theta) - Y)
# 回归直线
p_x = np.array([-1, 4])
p_y = theta[0, 0] + theta[1, 0] * p_x
# 显示数据
plt.plot(XY[0,:], XY[1,:], 'x', c='b')
plt.plot(p_x, p_y, c='r')
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()
2.线性回归解析解
线性模型
:
X
T
=
[
x
11
x
12
⋯
x
1
d
x
21
x
22
⋯
x
2
d
⋮
⋮
⋮
⋮
x
n
1
x
n
2
⋯
x
n
d
]
y
=
[
y
1
y
2
⋮
y
n
]
线性模型: \pmb{X}^T=\begin{bmatrix}x_{11}&x_{12}&\cdots&x_{1d}\\x_{21}&x_{22}&\cdots&x_{2d}\\\vdots&\vdots&\vdots&\vdots\\x_{n1}&x_{n2}&\cdots&x_{nd}\end{bmatrix}\space\space\space\space \pmb{y}=\begin{bmatrix}y_1\\y_2\\\vdots\\y_n\end{bmatrix}
线性模型:XXT=⎣
⎡x11x21⋮xn1x12x22⋮xn2⋯⋯⋮⋯x1dx2d⋮xnd⎦
⎤ yy=⎣
⎡y1y2⋮yn⎦
⎤
(
显然
R
(
X
T
)
≠
R
(
X
T
∣
y
)
,
X
T
θ
=
y
无解
)
解决问题:
X
T
θ
→
y
(显然R(X^T)\not=R(X^T|y), \pmb{X}^T\pmb{\theta}=\pmb{y}\space无解)\space\space解决问题:\pmb{X}^T\pmb{\theta}\rightarrow\pmb{y}
(显然R(XT)=R(XT∣y),XXTθθ=yy 无解) 解决问题:XXTθθ→yy
引入代价函数:
J
=
∥
X
T
θ
−
y
∥
2
2
,
使
∂
J
∂
θ
=
0
而不是
J
=
0
引入代价函数:J=\lVert\pmb{X}^T\pmb{\theta}-\pmb{y}\rVert_2^2,\space 使\frac{\partial{J}}{\partial\theta}=0\space而不是J=0
引入代价函数:J=∥XXTθθ−yy∥22, 使∂θ∂J=0 而不是J=0
J
=
∥
X
T
θ
−
y
∥
2
2
=
(
X
T
θ
−
y
)
T
(
X
T
θ
−
y
)
=
(
θ
T
X
−
y
T
)
(
X
T
θ
−
y
)
=
θ
T
X
X
T
θ
−
θ
T
X
y
−
y
T
X
T
θ
+
y
T
y
\begin{aligned} J&=\lVert\pmb{X}^T\pmb{\theta}-\pmb{y}\rVert_2^2=(\pmb{X}^T\pmb{\theta}-\pmb{y})^T(\pmb{X}^T\pmb{\theta}-\pmb{y})=(\pmb{\theta}^T\pmb{X}-\pmb{y}^T)(\pmb{X}^T\pmb{\theta}-\pmb{y})\\ &=\pmb{\theta}^T\pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{\theta}^T\pmb{X}\pmb{y}-\pmb{y}^T\pmb{X}^T\pmb{\theta}+\pmb{y}^T\pmb{y} \end{aligned}
J=∥XXTθθ−yy∥22=(XXTθθ−yy)T(XXTθθ−yy)=(θθTXX−yyT)(XXTθθ−yy)=θθTXXXXTθθ−θθTXXyy−yyTXXTθθ+yyTyy
其中,对于
θ
T
X
y
,设
X
维数为
2
,样本数为
3
其中,对于\pmb{\theta}^T\pmb{X}\pmb{y},设X维数为2,样本数为3
其中,对于θθTXXyy,设X维数为2,样本数为3
θ
T
X
y
=
[
θ
1
,
θ
2
]
[
x
11
x
21
x
31
x
12
x
22
x
32
]
[
y
1
y
2
y
3
]
=
[
θ
1
x
11
+
θ
2
x
12
,
θ
1
x
21
+
θ
2
x
22
,
θ
1
x
31
+
θ
2
x
32
]
[
y
1
y
2
y
3
]
=
(
θ
1
x
11
+
θ
2
x
12
)
y
1
+
(
θ
1
x
21
+
θ
2
x
22
)
y
2
+
(
θ
1
x
31
+
θ
2
x
32
)
y
3
\begin{aligned} \pmb{\theta}^T\pmb{X}\pmb{y}&=\begin{bmatrix}\theta_1,\theta_2\end{bmatrix}\begin{bmatrix}x_{11}&x_{21}&x_{31}\\x_{12}&x_{22}&x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\\ &=\begin{bmatrix}\theta_1x_{11}+\theta_2x_{12},\theta_1x_{21}+\theta_2x_{22},\theta_1x_{31}+\theta_2x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\\ &=(\theta_1x_{11}+\theta_2x_{12})y_1+(\theta_1x_{21}+\theta_2x_{22})y_2+(\theta_1x_{31}+\theta_2x_{32})y_3 \end{aligned}
θθTXXyy=[θ1,θ2][x11x12x21x22x31x32]⎣
⎡y1y2y3⎦
⎤=[θ1x11+θ2x12,θ1x21+θ2x22,θ1x31+θ2x32]⎣
⎡y1y2y3⎦
⎤=(θ1x11+θ2x12)y1+(θ1x21+θ2x22)y2+(θ1x31+θ2x32)y3
∂ θ T X y ∂ θ = [ ∂ θ T X y ∂ θ 1 ∂ θ T X y ∂ θ 2 ] = [ x 11 y 1 + x 21 y 2 + x 31 y 3 x 12 y 1 + x 22 y 2 + x 32 y 3 ] = [ x 11 x 21 x 31 x 12 x 22 x 32 ] [ y 1 y 2 y 3 ] = X y \frac{\partial{\pmb{\theta}^T\pmb{X}\pmb{y}}}{\partial\theta}=\begin{bmatrix}\frac{\partial{\theta^TXy}}{\partial\theta_1}\\\frac{\partial{\theta^TXy}}{\partial\theta_2}\end{bmatrix}=\begin{bmatrix}x_{11}y_1+x_{21}y_2+x_{31}y_3\\x_{12}y_1+x_{22}y_2+x_{32}y_3\end{bmatrix}=\begin{bmatrix}x_{11}&x_{21}&x_{31}\\x_{12}&x_{22}&x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}=\pmb{X}\pmb{y} ∂θ∂θθTXXyy=[∂θ1∂θTXy∂θ2∂θTXy]=[x11y1+x21y2+x31y3x12y1+x22y2+x32y3]=[x11x12x21x22x31x32]⎣ ⎡y1y2y3⎦ ⎤=XXyy
求导:
∂
J
∂
θ
=
∂
(
θ
T
X
X
T
θ
−
θ
T
X
y
−
y
T
X
T
θ
+
y
T
y
)
∂
θ
=
2
X
X
T
θ
−
2
X
y
=
0
求导:\frac{\partial{J}}{\partial\theta}=\frac{\partial{(\pmb{\theta}^T\pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{\theta}^T\pmb{X}\pmb{y}-\pmb{y}^T\pmb{X}^T\pmb{\theta}+\pmb{y}^T\pmb{y})}}{\partial\theta}=2\pmb{X}\pmb{X}^T\pmb{\theta}-2\pmb{X}\pmb{y}=0
求导:∂θ∂J=∂θ∂(θθTXXXXTθθ−θθTXXyy−yyTXXTθθ+yyTyy)=2XXXXTθθ−2XXyy=0
X
X
T
θ
−
X
y
=
0
⇒
θ
=
(
X
X
T
)
−
1
X
y
(
如果
X
X
T
可逆
)
\pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{X}\pmb{y}=0\Rightarrow\pmb{\theta}=(\pmb{X}\pmb{X}^T)^{-1}\pmb{X}\pmb{y}(如果\pmb{X}\pmb{X}^T可逆)
XXXXTθθ−XXyy=0⇒θθ=(XXXXT)−1XXyy(如果XXXXT可逆)
为使 X X T 可逆,由于 S = X X T 为 d × d 矩阵,为使 S 可逆, S 必须满秩,即 R ( X X T ) = d ,因此样本矩阵 X 须满足 n > d 为使\pmb{X}\pmb{X}^T可逆,由于\pmb{S}=\pmb{X}\pmb{X}^T为d\times d矩阵,为使\pmb{S}可逆,\pmb{S}必须满秩,即R(\pmb{X}\pmb{X}^T)=d,因此样本矩阵X须满足n>d 为使XXXXT可逆,由于SS=XXXXT为d×d矩阵,为使SS可逆,SS必须满秩,即R(XXXXT)=d,因此样本矩阵X须满足n>d
theta_2 = np.linalg.inv(X.dot(X.T)).dot(X).dot(Y)
# 回归直线
p_x2 = np.array([-1, 4])
p_y2 = theta_2[0, 0] + theta_2[1, 0] * p_x2
# 显示数据
plt.plot(XY[0,:], XY[1,:], 'x', c='b')
plt.plot(p_x2, p_y2, c='r')
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()