输入与输出满足线性关系,且输出为一系列连续的值。
假设函数:
h
(
x
⃗
)
=
θ
⃗
T
x
⃗
h(\vec{x}) = \vec{\theta}^T\vec{x}
h(x)=θTx
其中:
x
⃗
=
[
x
0
,
x
1
,
.
.
.
,
x
n
]
T
∈
R
(
n
+
1
)
×
1
θ
⃗
=
[
θ
0
,
θ
1
,
.
.
.
,
θ
n
]
T
∈
R
(
n
+
1
)
×
1
(
x
0
=
1
,
n
为
特
征
个
数
)
\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (x_0=1,n为特征个数) \end{aligned}
x=[x0,x1,...,xn]T∈R(n+1)×1θ=[θ0,θ1,...,θn]T∈R(n+1)×1(x0=1,n为特征个数)
代价函数:
J
(
θ
⃗
)
=
1
2
m
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
2
J( \vec{\theta}) = \frac{1}{2m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2
J(θ)=2m1i=1∑i=m(h(x(i))−y(i))2
其中:
y
⃗
=
[
y
(
1
)
,
y
(
2
)
,
.
.
.
,
y
(
m
)
]
T
∈
R
(
m
×
1
)
(
m
为
测
试
样
本
个
数
)
\begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ (m为测试样本个数) \end{aligned}
y=[y(1),y(2),...,y(m)]T∈R(m×1)(m为测试样本个数)
梯度下降方向更新
θ
\theta
θ:
θ
j
:
=
θ
j
−
α
∂
J
(
θ
⃗
)
∂
θ
j
\theta_j := \theta_j-\alpha\frac{\partial J( \vec{\theta})}{\partial \theta_j}
θj:=θj−α∂θj∂J(θ)
因为
J
(
θ
)
J(\theta)
J(θ)为凸函数,存在极小值点。当
∂
J
(
θ
)
/
∂
θ
j
>
0
{\partial J(\theta)} / {\partial \theta_j} > 0
∂J(θ)/∂θj>0时,此时
θ
\theta
θ在最优值右边,更新使
θ
\theta
θ值减小。当
∂
J
(
θ
)
/
∂
θ
j
<
0
{\partial J(\theta)} / {\partial \theta_j} < 0
∂J(θ)/∂θj<0时,此时
θ
\theta
θ在最优值左边,更新使
θ
\theta
θ值增大。
θ
j
:
=
θ
j
−
α
1
m
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
x
⃗
j
(
i
)
=
θ
j
−
α
m
(
θ
⃗
T
x
⃗
(
1
)
−
y
(
1
)
θ
⃗
T
x
⃗
(
2
)
−
y
(
2
)
.
.
.
θ
⃗
T
x
⃗
(
m
)
−
y
(
m
)
)
(
x
⃗
j
(
1
)
x
⃗
j
(
2
)
.
.
.
x
⃗
j
(
m
)
)
=
θ
j
−
α
m
(
θ
⃗
T
x
⃗
(
1
)
−
y
(
1
)
θ
⃗
T
x
⃗
(
2
)
−
y
(
2
)
.
.
.
θ
⃗
T
x
⃗
(
m
)
−
y
(
m
)
)
T
(
x
⃗
j
(
1
)
x
⃗
j
(
2
)
.
.
.
x
⃗
j
(
m
)
)
\begin{aligned} \theta_j :&= \theta_j-\alpha\frac{1}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})\vec{x}_j^{(i)} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} & \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} & ...& \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix}^T \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ \end{aligned}
θj:=θj−αm1i=1∑i=m(θTx(i)−y(i))xj(i)=θj−mα(θTx(1)−y(1)θTx(2)−y(2)...θTx(m)−y(m))⎝⎜⎜⎜⎛xj(1)xj(2)...xj(m)⎠⎟⎟⎟⎞=θj−mα⎝⎜⎜⎛θTx(1)−y(1)θTx(2)−y(2)...θTx(m)−y(m)⎠⎟⎟⎞T⎝⎜⎜⎜⎛xj(1)xj(2)...xj(m)⎠⎟⎟⎟⎞
所以:
(
θ
0
θ
1
.
.
.
θ
n
)
T
:
=
(
θ
0
θ
1
.
.
.
θ
n
)
T
−
α
m
(
θ
⃗
T
x
⃗
(
1
)
−
y
(
1
)
θ
⃗
T
x
⃗
(
2
)
−
y
(
2
)
.
.
.
θ
⃗
T
x
⃗
(
m
)
−
y
(
m
)
)
T
(
x
⃗
0
(
1
)
x
⃗
1
(
1
)
.
.
.
x
⃗
n
(
1
)
x
⃗
0
(
2
)
x
⃗
1
(
2
)
.
.
.
x
⃗
n
(
2
)
.
.
.
.
.
.
.
.
.
.
.
.
x
⃗
0
(
m
)
x
⃗
1
(
m
)
.
.
.
x
⃗
n
(
m
)
)
:
=
(
θ
0
θ
1
.
.
.
θ
n
)
T
−
α
m
(
(
x
⃗
(
1
)
)
T
θ
⃗
−
y
(
1
)
(
x
⃗
(
2
)
)
T
θ
⃗
−
y
(
2
)
.
.
.
(
x
⃗
(
m
)
)
T
θ
⃗
−
y
(
m
)
)
T
(
x
(
0
,
0
)
x
(
0
,
1
)
.
.
.
x
(
0
,
n
)
x
(
1
,
0
)
x
(
1
,
1
)
.
.
.
x
(
1
,
n
)
.
.
.
.
.
.
.
.
.
.
.
.
x
(
m
−
1
,
0
)
x
(
m
−
1
,
1
)
.
.
.
x
(
m
−
1
,
n
)
)
\begin{aligned} \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} \vec{x}_0^{(1)} & \vec{x}_1^{(1)} & ...&\vec{x}_n^{(1)}\\ \vec{x}_0^{(2)} & \vec{x}_1^{(2)} & ...&\vec{x}_n^{(2)}\\ ... & ...&...&...\\ \vec{x}_0^{(m)} & \vec{x}_1^{(m)} & ...&\vec{x}_n^{(m)} \end{pmatrix} \\ &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} (\vec{x}^{(1)})^T \vec{\theta}-y^{(1)} \\ (\vec{x}^{(2)})^T \vec{\theta}-y^{(2)} \\ ...\\(\vec{x}^{(m)})^T \vec{\theta}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} x_{(0, 0)} & x_{(0, 1)} & ...&x_{(0, n)}\\ x_{(1, 0)} & x_{(1, 1)}& ...&x_{(1, n)}\\ ... & ...&...&...\\ x_{(m-1, 0)} & x_{(m-1,1)} & ...&x_{(m-1, n)} \end{pmatrix} \end{aligned}
⎝⎜⎜⎛θ0θ1...θn⎠⎟⎟⎞T:=⎝⎜⎜⎛θ0θ1...θn⎠⎟⎟⎞T−mα⎝⎜⎜⎛θTx(1)−y(1)θTx(2)−y(2)...θTx(m)−y(m)⎠⎟⎟⎞T⎝⎜⎜⎜⎛x0(1)x0(2)...x0(m)x1(1)x1(2)...x1(m)............xn(1)xn(2)...xn(m)⎠⎟⎟⎟⎞:=⎝⎜⎜⎛θ0θ1...θn⎠⎟⎟⎞T−mα⎝⎜⎜⎛(x(1))Tθ−y(1)(x(2))Tθ−y(2)...(x(m))Tθ−y(m)⎠⎟⎟⎞T⎝⎜⎜⎛x(0,0)x(1,0)...x(m−1,0)x(0,1)x(1,1)...x(m−1,1)............x(0,n)x(1,n)...x(m−1,n)⎠⎟⎟⎞
得到
θ
⃗
\vec{\theta}
θ更新的向量表达形式:
θ
⃗
T
:
=
θ
⃗
T
−
α
m
(
X
θ
⃗
−
y
⃗
)
T
X
θ
⃗
:
=
θ
⃗
−
α
m
X
T
(
X
θ
⃗
−
y
⃗
)
\begin{aligned} \vec{\theta}^T &:= \vec{\theta}^T-\frac{\alpha}{m}({X} \vec{\theta}-\vec{y})^T{X} \\ \vec{\theta} &:= \vec{\theta}-\frac{\alpha}{m}{X}^T({X} \vec{\theta}-\vec{y}) \end{aligned}
θTθ:=θT−mα(Xθ−y)TX:=θ−mαXT(Xθ−y)
代码实现如下:
import numpy as np
import matplotlib.pyplot as plt
def linear_regression(x_in, y_in, alpha=0.01, epsilon=1e-5):
sample_num = y_in.shape[0]
return gradient_descent(sample_num, x_in, y_in, cost_function_lr, alpha, epsilon)
def cost_function_lr(sample_num, theta, x_in, y_in):
diff = (x_in*theta)-y_in
j_theta = diff.T*diff/sample_num
partial_theta = (x_in.T*diff)/sample_num
return (j_theta, partial_theta)
def gradient_descent(sample_num, x_in, y_in, cost_function, alpha, epsilon):
theta = np.mat(np.zeros((x_in.shape[1],1)))
pre_theta = 0xFFFFFFFF;
count = 5000
while count:
(j_theta, partial_theta) = cost_function(sample_num, theta, x_in, y_in)
if j_theta < epsilon or np.fabs(partial_theta).all() < epsilon:
break
theta -= alpha*partial_theta
if j_theta > pre_theta:
alpha /= 10
pre_theta = j_theta;
count -= 1
if not count:
print('get max count')
return theta
if __name__ == '__main__':
m = 30
x0 = np.ones((m, 1))
x1 = np.arange(1, m+1).reshape(m, 1)
x_in = np.mat(np.hstack((x0, x1)))
theta = np.mat([5, 0.5]).reshape(2, 1)
y_in = x_in*theta + np.random.randn(m).reshape(m, 1)
res_theta_gd = linear_regression(x_in, y_in, 0.001, 1e-5)
plt.scatter(x1, np.array(y_in))
plt.plot(x1, np.array(x_in*res_theta_gd), color='r')
diff = x_in*res_theta_gd-y_in
plt.title('cost: %f' % ((diff.T*diff)[0][0] / (2*m)))
plt.show()
结果分析:
归一化:
对于
x
⃗
\vec{x}
x的各个特征取值范围进行归一化处理,提高收敛速度。
x
j
=
x
j
−
m
i
n
(
x
j
(
i
)
)
m
a
x
(
x
j
(
i
)
)
−
m
i
n
(
x
j
(
i
)
)
x_j = \frac{x_j-min(x_j^{(i)})}{max(x_j^{(i)})-min(x_j^{(i)})}
xj=max(xj(i))−min(xj(i))xj−min(xj(i))
标准化:
x
j
=
x
j
−
u
σ
x_j = \frac{x_j-u}{\sigma}
xj=σxj−u
其中:
i
=
1
,
2
,
.
.
,
m
(
m
为
样
本
个
数
)
j
=
0
,
1
,
2
,
.
.
,
n
(
n
为
特
征
个
数
)
u
为
均
值
σ
为
标
准
差
\begin{aligned} &i=1, 2, .., m&(m为样本个数)\\ &j =0, 1,2,..,n&(n为特征个数)\\ &u为均值\\ &\sigma为标准差 \end{aligned}
i=1,2,..,mj=0,1,2,..,nu为均值σ为标准差(m为样本个数)(n为特征个数)