输入与输出满足线性关系,且输出为一系列连续的值。
则假设函数:
h
(
x
⃗
)
=
θ
⃗
T
x
⃗
h(\vec{x}) = \vec{\theta}^T\vec{x}
h(x)=θTx
其中:
x
⃗
=
[
x
0
,
x
1
,
.
.
.
,
x
n
]
T
∈
R
(
n
+
1
)
×
1
θ
⃗
=
[
θ
0
,
θ
1
,
.
.
.
,
θ
n
]
T
∈
R
(
n
+
1
)
×
1
(
x
0
=
1
,
n
为
特
征
个
数
)
\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (x_0=1,n为特征个数) \end{aligned}
x=[x0,x1,...,xn]T∈R(n+1)×1θ=[θ0,θ1,...,θn]T∈R(n+1)×1(x0=1,n为特征个数)
代价函数:
J
(
θ
⃗
)
=
1
2
m
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
2
J( \vec{\theta}) = \frac{1}{2m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2
J(θ)=2m1i=1∑i=m(h(x(i))−y(i))2
其中:
y
⃗
=
[
y
(
1
)
,
y
(
2
)
,
.
.
.
,
y
(
m
)
]
T
∈
R
m
×
1
(
m
为
测
试
样
本
个
数
)
\begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{m\times1} \\ (m为测试样本个数) \end{aligned}
y=[y(1),y(2),...,y(m)]T∈Rm×1(m为测试样本个数)
要使代价函数最小,即取极值。此时满足:
d
J
(
θ
⃗
)
d
θ
⃗
=
0
d
d
θ
⃗
2
m
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
2
=
0
1
m
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
d
(
θ
⃗
T
x
⃗
(
i
)
)
d
θ
⃗
=
0
\begin{aligned} \frac{dJ( \vec{\theta})}{d \vec{\theta}} &= 0 \\ \frac{d}{d \vec{\theta}2m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})^2 &= 0 \\ \frac{1}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})\frac{d( \vec{\theta}^T\vec{x}^{(i)})}{d \vec{\theta}} &= 0 \\ \end{aligned}
dθdJ(θ)dθ2mdi=1∑i=m(θTx(i)−y(i))2m1i=1∑i=m(θTx(i)−y(i))dθd(θTx(i))=0=0=0
因为
d
(
θ
⃗
T
x
⃗
(
i
)
)
d
θ
⃗
=
d
(
θ
0
x
⃗
0
(
i
)
+
θ
1
x
⃗
1
(
i
)
+
.
.
.
+
θ
n
x
⃗
n
(
i
)
)
d
θ
⃗
\frac{d( \vec{\theta}^T\vec{x}^{(i)})}{d \vec{\theta}} =\frac{d(\theta_0\vec{x}_0^{(i)} + \theta_1\vec{x}_1^{(i)} + ...+\theta_n\vec{x}_n^{(i)})}{ d\vec{\theta}}
dθd(θTx(i))=dθd(θ0x0(i)+θ1x1(i)+...+θnxn(i))
所以
∂
(
θ
⃗
T
x
⃗
(
i
)
)
∂
θ
j
=
x
⃗
j
(
i
)
\frac{\partial( \vec{\theta}^T\vec{x}^{(i)})}{\partial\theta_j} =\vec{x}_j^{(i)}
∂θj∂(θTx(i))=xj(i)
故要使代价函数最小:
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
=
0
(
x
⃗
(
i
)
)
T
θ
⃗
=
(
y
(
i
)
)
T
=
y
(
i
)
\begin{aligned} \vec{\theta}^T\vec{x}^{(i)}-y^{(i)} &= 0 \\ (\vec{x}^{(i)})^T \vec{\theta}=(y^{(i)})^T&=y^{(i)} \\ \end{aligned}
θTx(i)−y(i)(x(i))Tθ=(y(i))T=0=y(i)
有
(
(
x
⃗
(
1
)
)
T
θ
⃗
(
x
⃗
(
2
)
)
T
θ
⃗
⋮
(
x
⃗
(
m
)
)
T
θ
⃗
)
=
(
y
(
1
)
y
(
2
)
⋮
y
(
m
)
)
{\begin{pmatrix} (\vec{x}^{(1)})^T \vec{\theta} \\ (\vec{x}^{(2)})^T \vec{\theta} \\ \vdots \\ (\vec{x}^{(m)})^T \vec{\theta} \\ \end{pmatrix}} \ = \begin{pmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \\ \end{pmatrix}
⎝⎜⎜⎜⎛(x(1))Tθ(x(2))Tθ⋮(x(m))Tθ⎠⎟⎟⎟⎞ =⎝⎜⎜⎜⎛y(1)y(2)⋮y(m)⎠⎟⎟⎟⎞
得到
X
θ
⃗
=
y
⃗
(
X
∈
R
m
×
(
n
+
1
)
)
{X} \vec{\theta}= \vec{y} \space ({X}\in\mathbb R^{m\times (n+1)})
Xθ=y (X∈Rm×(n+1))
求解
θ
\theta
θ:
θ
⃗
=
(
X
T
X
)
−
1
X
T
y
⃗
\vec{\theta}=({X}^T{X})^{-1}{X}^T \vec{y}
θ=(XTX)−1XTy
python实现:
import numpy as np
import matplotlib.pyplot as plt
def linear_regression(x_in, y_in):
return np.linalg.pinv(x_in.T * x_in)*x_in.T*y_in
if __name__ == '__main__':
m = 30
x0 = np.ones((m, 1))
x1 = np.arange(1, m+1).reshape(m, 1)
x_in = np.mat(np.hstack((x0, x1)))
theta = np.mat([5, 0.5]).reshape(2, 1)
y_in = x_in*theta + np.random.randn(m).reshape(m, 1)
plt.scatter(x1, np.array(y_in))
res_theta_formula = linear_regression(x_in, y_in)
plt.plot(x1, np.array(x_in*res_theta_formula), color='r')
diff = x_in*res_theta_formula-y_in
plt.title('cost: %f' % ((diff.T*diff)[0][0]/(2*m)))
plt.show()
结果分析:
优点:1.不用归一化。2.不需要迭代。
缺点:1.当n过大时,计算速度慢 ( n > 1 0 6 ) (n>10^6) (n>106)。2.可逆函数不一定存在。
不可逆原因:1.不同特征线性相关(删除线性相关特征)。2. m < n m<n m<n(通过正则化解决)