最小二乘法学习笔记
我对最好二乘法(least squares)的理解是通过标准差分析误差以拟合已知函数形式的参数,且该参数与函数呈线性关系。如果不满足以上加粗的两个条件,需要重新推到公式。
根据以上两个前提假定参数为
θ
\theta
θ,函数的变现形式为
H
(
x
)
=
θ
0
+
θ
1
h
1
(
x
)
+
⋅
⋅
⋅
+
θ
n
h
n
(
x
)
H(x)=\theta_{0}+\theta_{1}h_{1}(x)+···+\theta_{n}h_{n}(x)
H(x)=θ0+θ1h1(x)+⋅⋅⋅+θnhn(x)
假定输入的数据有m组,可通过矩阵的形式进行表示,共有
m
m
m行数据,根据
H
(
x
)
H(x)
H(x)的形式,矩阵有
n
+
1
n+1
n+1列,输入的矩阵为
X
X
X:
X
=
[
1
h
1
(
x
1
)
h
2
(
x
1
)
⋅
⋅
⋅
h
n
(
x
1
)
1
h
1
(
x
2
)
h
2
(
x
2
)
⋅
⋅
⋅
h
n
(
x
2
)
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
1
h
1
(
x
m
)
h
2
(
x
m
)
⋅
⋅
⋅
h
n
(
x
m
)
]
X=\begin{bmatrix} 1 &h_{1}(x_{1}) &h_{2}(x_{1}) &··· &h_{n}(x_{1}) \\ 1 &h_{1}(x_{2}) &h_{2}(x_{2}) &··· &h_{n}(x_{2}) \\ ··· &··· &··· &··· &··· \\ 1 &h_{1}(x_{m}) &h_{2}(x_{m}) &··· &h_{n}(x_{m}) \end{bmatrix}
X=
11⋅⋅⋅1h1(x1)h1(x2)⋅⋅⋅h1(xm)h2(x1)h2(x2)⋅⋅⋅h2(xm)⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅hn(x1)hn(x2)⋅⋅⋅hn(xm)
拟合的目标矩阵也可表示为
Y
Y
Y:
Y
=
[
y
1
y
2
⋅
⋅
⋅
y
m
]
Y=\begin{bmatrix} y_{1} \\ y_{2}\\ ···\\ y_{m} \end{bmatrix}
Y=
y1y2⋅⋅⋅ym
假设参数的对应向量为
θ
\theta
θ:
θ
=
[
θ
1
θ
2
⋅
⋅
⋅
θ
n
]
\theta=\begin{bmatrix} \theta_{1} \\ \theta_{2}\\ ···\\ \theta_{n} \end{bmatrix}
θ=
θ1θ2⋅⋅⋅θn
则误差
E
E
E可表示为:
E
=
∥
X
θ
−
Y
∥
2
=
(
X
θ
−
Y
)
T
(
X
θ
−
Y
)
=
[
(
X
θ
)
T
−
Y
T
]
(
X
θ
−
Y
)
=
(
θ
T
X
T
−
Y
T
)
(
X
θ
−
Y
)
=
θ
T
X
T
X
θ
−
(
X
θ
)
T
Y
−
Y
T
X
θ
+
Y
T
Y
=
θ
T
X
T
X
θ
−
2
Y
T
X
θ
+
Y
T
Y
\begin{align*} E&=\left \| X\theta -Y\right \| ^{2}\\ &=(X\theta -Y)^{T} (X\theta -Y)\\ &=[(X\theta)^{T}-Y^{T}](X\theta-Y)\\ &=(\theta^{T}X^{T}-Y^{T})(X\theta-Y)\\ &=\theta^{T}X^{T}X\theta-(X\theta)^{T}Y-Y^{T}X\theta+Y^{T}Y\\ &=\theta^{T}X^{T}X\theta-2Y^{T}X\theta+Y^{T}Y \end{align*}
E=∥Xθ−Y∥2=(Xθ−Y)T(Xθ−Y)=[(Xθ)T−YT](Xθ−Y)=(θTXT−YT)(Xθ−Y)=θTXTXθ−(Xθ)TY−YTXθ+YTY=θTXTXθ−2YTXθ+YTY
其中
X
θ
X\theta
Xθ和
Y
Y
Y均为向量
若要使得误差E最小,需要使其导数最小,即:
∂
E
∂
θ
=
0
\frac{\partial E}{\partial \theta } =0
∂θ∂E=0
E
E
E对
θ
\theta
θ的求导可表示为:
∂
E
∂
θ
=
2
X
T
X
θ
−
2
X
T
Y
\frac{\partial E}{\partial \theta } =2X^{T}X\theta-2X^{T}Y
∂θ∂E=2XTXθ−2XTY
则可求出
θ
\theta
θ为:
θ
=
(
X
T
X
)
−
1
X
T
Y
\theta=(X^{T}X)^{-1}X^{T}Y
θ=(XTX)−1XTY