1、假设函数矩阵表示
定义样本(m个样本,每个样本有n个特征)
X
=
[
(
x
(
1
)
)
T
(
x
(
2
)
)
T
.
.
.
(
x
(
m
)
)
T
]
,
其
中
x
(
m
)
=
[
1
x
m
1
x
m
2
.
.
.
x
m
n
]
X=\left[ \begin{array}{c} (x^{(1)})^{T}\\ (x^{(2)})^{T}\\ ...\\ (x^{(m)})^{T}\\ \end{array} \right],其中x^{(m)}= \left[ \begin{array}{c} 1\\ x_{m1}\\ x_{m2}\\ ...\\ x_{mn}\\ \end{array} \right]
X=⎣⎢⎢⎡(x(1))T(x(2))T...(x(m))T⎦⎥⎥⎤,其中x(m)=⎣⎢⎢⎢⎢⎡1xm1xm2...xmn⎦⎥⎥⎥⎥⎤
定义
Y
=
[
y
(
1
)
y
(
2
)
.
.
.
y
(
m
)
]
,
θ
=
[
θ
0
θ
1
.
.
.
θ
m
]
Y=\left[ \begin{array}{c} y^{(1)}\\ y^{(2)}\\ ...\\ y^{(m)} \end{array} \right],\quad\theta=\left[ \begin{array}{c} \theta_{0}\\ \theta_{1}\\ ...\\ \theta_{m} \end{array} \right]
Y=⎣⎢⎢⎡y(1)y(2)...y(m)⎦⎥⎥⎤,θ=⎣⎢⎢⎡θ0θ1...θm⎦⎥⎥⎤
则有
h
θ
(
x
(
i
)
)
=
(
x
(
i
)
)
T
θ
=
[
1
x
i
1
.
.
.
x
i
n
]
[
θ
0
θ
1
.
.
.
θ
n
]
=
θ
0
+
θ
1
x
i
1
+
.
.
.
+
θ
n
x
i
n
\begin{aligned} h_{\theta}(x^{(i)})=(x^{(i)})^{T}\theta=[1\quad x_{i1}\quad...\quad x_{in}] \left[ \begin{array}{c} \theta_{0}\\ \theta_{1}\\ ...\\ \theta_{n} \end{array} \right]=\theta_{0}+\theta_{1}x_{i1}+...+\theta_{n}x_{in} \end{aligned}
hθ(x(i))=(x(i))Tθ=[1xi1...xin]⎣⎢⎢⎡θ0θ1...θn⎦⎥⎥⎤=θ0+θ1xi1+...+θnxin
故假设函数可表示为
h
θ
(
X
)
=
X
θ
=
[
(
x
(
1
)
)
T
θ
(
x
(
2
)
)
T
θ
.
.
.
(
x
(
m
)
)
T
θ
]
=
[
h
θ
(
x
(
1
)
)
h
θ
(
x
(
2
)
)
.
.
.
h
θ
(
x
(
m
)
)
]
h_{\theta}(X)=X\theta=\left[ \begin{array}{c} (x^{(1)})^{T}\theta\\ (x^{(2)})^{T}\theta\\ ...\\ (x^{(m)})^{T}\theta\\ \end{array} \right]=\left[ \begin{array}{c} h_{\theta}(x^{(1)})\\ h_{\theta}(x^{(2)})\\ ...\\ h_{\theta}(x^{(m)})\\ \end{array} \right]
hθ(X)=Xθ=⎣⎢⎢⎡(x(1))Tθ(x(2))Tθ...(x(m))Tθ⎦⎥⎥⎤=⎣⎢⎢⎡hθ(x(1))hθ(x(2))...hθ(x(m))⎦⎥⎥⎤
2、代价函数矩阵表示
最小均方差(LMS)代价函数为
J
(
θ
)
=
1
2
∑
i
=
1
m
[
h
θ
(
x
(
i
)
)
−
y
(
i
)
]
2
=
1
2
(
X
θ
−
Y
)
T
(
X
θ
−
Y
)
J(\theta)=\frac{1}{2}\sum_{i=1}^{m}[h_{\theta}(x^{(i)})-y^{(i)}]^{2}=\frac{1}{2}(X\theta-Y)^{T}(X\theta-Y)
J(θ)=21i=1∑m[hθ(x(i))−y(i)]2=21(Xθ−Y)T(Xθ−Y)
3、LMS的闭式解
通过矩阵微分计算LMS梯度
▽
θ
J
(
θ
)
=
▽
θ
1
2
(
X
θ
−
Y
)
T
(
X
θ
−
Y
)
=
1
2
▽
θ
(
θ
T
X
T
X
θ
−
θ
T
X
T
Y
−
Y
T
X
θ
+
Y
T
Y
)
=
1
2
▽
θ
(
θ
T
X
T
X
θ
−
θ
T
X
T
Y
−
Y
T
X
θ
)
∂
∂
Y
T
Y
=
0
=
1
2
▽
θ
t
r
(
θ
T
X
T
X
θ
−
θ
T
X
T
Y
−
Y
T
X
θ
)
这
里
是
一
个
具
体
的
数
,
t
r
a
=
a
.
a
∈
R
=
1
2
▽
θ
[
t
r
(
θ
T
X
T
X
θ
)
−
2
t
r
(
Y
T
X
θ
)
]
t
r
(
A
)
=
t
r
(
A
T
)
,
则
t
r
(
θ
T
X
T
Y
)
=
t
r
(
Y
T
X
θ
)
=
1
2
t
r
[
▽
θ
(
θ
T
X
T
)
⋅
X
θ
+
θ
T
X
T
⋅
▽
θ
(
X
T
θ
)
]
−
▽
θ
t
r
(
Y
T
X
θ
)
=
1
2
t
r
(
X
T
X
θ
+
θ
T
X
T
X
)
−
X
T
Y
∂
(
θ
T
X
)
∂
θ
=
∂
(
X
T
θ
)
∂
θ
=
X
,
∂
t
r
(
A
B
)
∂
A
=
∂
t
r
(
B
A
)
∂
A
=
B
T
=
1
2
t
r
(
X
T
X
θ
)
+
1
2
t
r
(
θ
T
X
T
X
)
−
X
T
Y
=
t
r
(
X
T
X
θ
)
−
X
T
Y
=
X
T
X
θ
−
X
T
Y
\begin{aligned} \bigtriangledown_{\theta}J(\theta)&=\bigtriangledown_{\theta}\frac{1}{2}(X\theta-Y)^{T}(X\theta-Y)\\ &=\frac{1}{2}\bigtriangledown_{\theta}(\theta^{T}X^{T}X\theta-\theta^{T}X^{T}Y-Y^{T}X\theta+Y^{T}Y)\\ &=\frac{1}{2}\bigtriangledown_{\theta}(\theta^{T}X^{T}X\theta-\theta^{T}X^{T}Y-Y^{T}X\theta) \quad \quad \quad {\color{red}\frac{\partial}{\partial}Y^{T}Y=0}\\ &=\frac{1}{2}\bigtriangledown_{\theta}tr(\theta^{T}X^{T}X\theta-\theta^{T}X^{T}Y-Y^{T}X\theta) \quad \quad \quad {\color{red}这里是一个具体的数,tra=a. \quad a\in R}\\ &=\frac{1}{2}\bigtriangledown_{\theta}[tr(\theta^{T}X^{T}X\theta)-2tr(Y^{T}X\theta)] \quad \quad \quad {\color{red}tr(A)=tr(A^{T}),则tr(\theta^{T}X^{T}Y)=tr(Y^{T}X\theta)}\\ &=\frac{1}{2}tr[\bigtriangledown_{\theta}(\theta^{T}X^{T}) \cdot X\theta + \theta^{T}X^{T} \cdot \bigtriangledown_{\theta}(X^{T}\theta)]-\bigtriangledown_{\theta}tr(Y^{T}X\theta)\\ &=\frac{1}{2}tr(X^{T}X\theta+\theta^{T}X^{T}X)-X^{T}Y \quad {\color{red}\frac{\partial(\theta^{T}X)}{\partial\theta}=\frac{\partial(X^{T}\theta)}{\partial\theta}=X,\frac{\partial tr(AB)}{\partial A}=\frac{\partial tr(BA)}{\partial A}=B^{T}}\\ &=\frac{1}{2}tr(X^{T}X\theta)+\frac{1}{2}tr(\theta^{T}X^{T}X)-X^{T}Y\\ &=tr(X^{T}X\theta)-X^{T}Y\\ &=X^{T}X\theta-X^{T}Y \end{aligned}
▽θJ(θ)=▽θ21(Xθ−Y)T(Xθ−Y)=21▽θ(θTXTXθ−θTXTY−YTXθ+YTY)=21▽θ(θTXTXθ−θTXTY−YTXθ)∂∂YTY=0=21▽θtr(θTXTXθ−θTXTY−YTXθ)这里是一个具体的数,tra=a.a∈R=21▽θ[tr(θTXTXθ)−2tr(YTXθ)]tr(A)=tr(AT),则tr(θTXTY)=tr(YTXθ)=21tr[▽θ(θTXT)⋅Xθ+θTXT⋅▽θ(XTθ)]−▽θtr(YTXθ)=21tr(XTXθ+θTXTX)−XTY∂θ∂(θTX)=∂θ∂(XTθ)=X,∂A∂tr(AB)=∂A∂tr(BA)=BT=21tr(XTXθ)+21tr(θTXTX)−XTY=tr(XTXθ)−XTY=XTXθ−XTY
通过使梯度等于零获得闭式解
θ
∗
=
(
X
T
X
)
−
1
X
T
Y
P
S
:
(
X
T
X
)
−
1
有
时
很
难
求
出
\theta^{\ast}=(X^{T}X)^{-1}X^{T}Y \quad \quad \quad {\color{red}PS:(X^{T}X)^{-1}有时很难求出}
θ∗=(XTX)−1XTYPS:(XTX)−1有时很难求出
4、纯python实现
代码如下
import numpy as np
import matplotlib.pyplot as plt
import time
# 加载数据
def load_data():
X = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]
X_p = np.array(X)
Y = [2.000, 2.500, 2.900, 3.147, 4.515, 4.903, 5.365, 5.704, 6.853, 7.971, 8.561, 10.000, 11.280, 12.900]
Y_p = np.array(Y)
return X_p, Y_p
# 求闭式解
def close_form(X, Y):
X = np.array([X])
one = np.ones((1, 14))
vx = np.concatenate([one, X])
theta = np.dot(np.dot(np.linalg.pinv(np.dot(vx, vx.T)), vx), Y.T)
print(theta)
theta0 = theta[0]
theta1 = theta[1]
y = X[0] * theta1 + theta0
# 画图
plt.title('Close Form')
plt.xlabel('years')
plt.ylabel('prices')
plt.scatter(X[0], Y, c='#FF0000')
plt.plot(X[0], y)
plt.show()
# 预测2014年
print("the housing price in 2014 is %f"%(2014 * theta1 + theta0))
if __name__ == "__main__":
X, Y = load_data()
print("-----------------close form-------------------")
close_form(X, Y)
最后的拟合结果
(自己学习机器学习的笔记,如有错误望提醒修正)