在数据处理中,经常遇到寻求回归方程的问题,即根据一组实验数据,建立两个或多个物理量(舒称因素)之间的在统计意义上的依赖关系式。
引言
最小二乘模型可以解决两类实际问题。
第一类问题:在数据处理中经常遇到寻求回归方程的问题,即根据一组实验数据建立两个或多个物理量(俗称因素)之间的在统计意义上的依赖关系式。例如一个量
y
y
y与另一个或几个量
t
1
,
⋅
⋅
⋅
,
t
l
t_{1},···,t_{l}
t1,⋅⋅⋅,tl有关系。这类问题的一般性描述如下。假定要建立量
y
y
y与
l
l
l个量
t
1
,
⋅
⋅
⋅
,
t
l
t_{1},···,t_{l}
t1,⋅⋅⋅,tl之间的依赖关系式,设方程为:
y
=
F
(
t
1
,
t
2
,
⋅
⋅
⋅
,
t
l
;
x
1
,
x
2
,
⋅
⋅
⋅
,
x
n
)
y = F(t_{1},t_{2},···,t_{l};x_{1},x_{2},···,x_{n})
y=F(t1,t2,⋅⋅⋅,tl;x1,x2,⋅⋅⋅,xn)
其中
F
F
F的形式事先给定,
x
1
,
⋅
⋅
⋅
,
x
n
x_{1},···,x_{n}
x1,⋅⋅⋅,xn是待定参数。有
m
(
>
n
)
m( > n)
m(>n)组实验数据:
[
t
1
(
i
)
,
t
2
(
i
)
,
⋯
,
t
l
(
i
)
;
y
(
i
)
]
T
,
i
=
1
,
2
,
⋯
,
m
\left[t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)} ; y^{(i)}\right]^{T}, \quad i=1,2, \cdots, m
[t1(i),t2(i),⋯,tl(i);y(i)]T,i=1,2,⋯,m
其中
t
1
,
⋅
⋅
⋅
,
t
l
t_{1},···,t_{l}
t1,⋅⋅⋅,tl是试验中的已知数据,而
y
y
y是实验后得到的结果。 问题是,如何确定
n
n
n个参数
x
1
,
x
2
,
⋅
⋅
⋅
,
x
n
x_{1},x_{2},···,x_{n}
x1,x2,⋅⋅⋅,xn从而建立起回归方程。把第
i
i
i个实验点的自变量
t
1
(
i
)
,
t
2
(
i
)
,
⋯
,
t
l
(
i
)
t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)}
t1(i),t2(i),⋯,tl(i)代入到其中就会得到相应的函数值。
y
~
(
i
)
=
F
(
t
1
(
i
)
,
t
2
(
i
)
,
⋯
,
t
l
(
i
)
;
x
1
,
x
2
,
⋯
,
x
n
)
\tilde{y}^{(i)}=F\left(t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)} ; x_{1}, x_{2}, \cdots, x_{n}\right)
y~(i)=F(t1(i),t2(i),⋯,tl(i);x1,x2,⋯,xn)
y
~
(
i
)
\tilde{y}^{(i)}
y~(i)是真实值
y
i
y^{i}
yi的假设值。当然希望它们之差的绝对的假设值。当然希望它们之差的绝对
min
∑
i
=
1
m
[
F
(
t
1
(
i
)
,
t
2
(
i
)
,
⋯
,
t
j
(
i
)
;
x
1
,
x
2
,
⋯
,
x
n
)
−
y
(
i
)
]
2
\min \sum_{i=1}^{m}\left[F\left(t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{j}^{(i)} ; x_{1}, x_{2}, \cdots, x_{n}\right)-y^{(i)}\right]^{2}
mini=1∑m[F(t1(i),t2(i),⋯,tj(i);x1,x2,⋯,xn)−y(i)]2
是非常自然的事情。这就是最小二乘模型。令
f
i
(
x
1
,
⋯
,
x
n
)
=
F
(
t
1
(
i
)
,
⋯
,
t
i
(
i
)
;
x
1
,
⋯
,
x
n
)
−
y
(
i
)
f_{i}\left(x_{1}, \cdots, x_{n}\right)=F\left(t_{1}^{(i)}, \cdots, t_{i}^{(i)} ; x_{1}, \cdots, x_{n}\right)-y^{(i)}
fi(x1,⋯,xn)=F(t1(i),⋯,ti(i);x1,⋯,xn)−y(i)
则最小二乘模型变为
min
∑
i
=
1
m
f
i
2
(
x
1
,
⋯
,
x
n
)
\min \sum_{i=1}^{m} f_{i}^{2}\left(x_{1}, \cdots, x_{n}\right)
mini=1∑mfi2(x1,⋯,xn)
令
x
=
(
x
1
,
x
2
,
⋅
⋅
⋅
x
n
)
T
x=(x_{1},x_{2},···x_{n})^{T}
x=(x1,x2,⋅⋅⋅xn)T,
f
(
x
)
=
(
f
1
(
x
)
,
⋯
,
f
m
(
x
)
)
T
f(x)=\left(f_{1}(x), \cdots, f_{m}(x)\right)^{T}
f(x)=(f1(x),⋯,fm(x))T则又变成
min
∑
i
=
1
m
f
(
x
)
T
f
(
x
)
\min \sum_{i=1}^{m} f(x)^{T}f(x)
mini=1∑mf(x)Tf(x)
求它的最优解即称为求解最小二乘问题,将最优解
x
=
(
x
1
∗
,
x
2
∗
,
⋅
⋅
⋅
x
n
∗
)
T
x=(x_{1}^{*},x_{2}^{*},···x_{n}^{*})^{T}
x=(x1∗,x2∗,⋅⋅⋅xn∗)T代入到
y
=
F
(
t
1
,
t
2
,
⋅
⋅
⋅
,
t
l
;
x
1
,
x
2
,
⋅
⋅
⋅
,
x
n
)
y = F(t_{1},t_{2},···,t_{l};x_{1},x_{2},···,x_{n})
y=F(t1,t2,⋅⋅⋅,tl;x1,x2,⋅⋅⋅,xn)中所得
y
=
F
(
t
1
,
t
2
,
⋅
⋅
⋅
,
t
l
;
x
1
∗
,
x
2
∗
,
⋅
⋅
⋅
x
n
∗
)
y = F(t_{1},t_{2},···,t_{l};x_{1}^{*},x_{2}^{*},···x_{n}^{*})
y=F(t1,t2,⋅⋅⋅,tl;x1∗,x2∗,⋅⋅⋅xn∗)即为回归方程。显然最小二乘问题是无约束规划,但由于其特殊结构,有其特殊的求解方法。
当所有的 f i ( x 1 , ⋯ , x n ) = F ( t 1 ( i ) , ⋯ , t i ( i ) ; x 1 , ⋯ , x n ) − y ( i ) f_{i}\left(x_{1}, \cdots, x_{n}\right)=F\left(t_{1}^{(i)}, \cdots, t_{i}^{(i)} ; x_{1}, \cdots, x_{n}\right)-y^{(i)} fi(x1,⋯,xn)=F(t1(i),⋯,ti(i);x1,⋯,xn)−y(i)均是 x 1 , x 2 , ⋅ ⋅ ⋅ x n x_{1},x_{2},···x_{n} x1,x2,⋅⋅⋅xn的线性函数时,称为线性最小二乘问题,否则,称为非线性最小二乘问题。
第二类问题:求解方程组(数学问题)
f
1
(
x
1
,
x
2
,
⋯
,
x
n
)
=
0
f
2
(
x
1
,
x
2
,
⋯
,
x
n
)
=
0
⋮
f
m
(
x
1
,
x
2
,
⋯
,
x
n
)
=
0
}
\left.\begin{array}{l}{f_{1}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0} \\ {f_{2}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0} \\ {\vdots} \\ {f_{m}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0}\end{array}\right\}
f1(x1,x2,⋯,xn)=0f2(x1,x2,⋯,xn)=0⋮fm(x1,x2,⋯,xn)=0⎭
⎬
⎫
如是线性方程组,则为
A
x
−
b
=
0
Ax-b=0
Ax−b=0,当
R
(
A
)
<
R
(
A
,
b
)
R(A) < R(A,b)
R(A)<R(A,b)时,无解,但确实需要求解它。《线性代数》无办法。如是非线性方程组,在数学上求迭代解也非常麻烦,为此求解
min
∑
i
=
1
m
f
i
2
(
x
1
,
⋯
,
x
n
)
\min \sum_{i=1}^{m} f_{i}^{2}\left(x_{1}, \cdots, x_{n}\right)
mini=1∑mfi2(x1,⋯,xn)
是非常自然的事情。它也是一个最小二乘问题。
最小二乘问题的解法
(1) 线性最小二乘问题
当 f ( x ) f(x) f(x)取线性函数形式,即 f ( x ) = A x − b f(x)=Ax-b f(x)=Ax−b, A m × n A_{m \times n} Am×n,则线性最小二乘问题为 m i n ( A x − b ) T ( A x − b ) min(Ax-b)^{T}(Ax-b) min(Ax−b)T(Ax−b)。其极小点的解 x ∗ x^{*} x∗有以下充要条件: A T A x ∗ = A T b A^{T} A x^{*}=A^{T} b ATAx∗=ATb。
(2) 非线性最小二乘问题
假定选定初始点 x 0 x_{0} x0后,经过 k k k 次迭代已求得 x k x_{k} xk。现在考虑 x k + 1 x_{k+1} xk+1的求法。与Newton法的基本思想相类似,把 f ( x ) f(x) f(x)线性化,用线性最小二乘问题的解去逼近非线性最小二乘问题的解。具体做法如下。
把
f
(
x
)
f(x)
f(x)的第
i
i
i个分量在
x
k
x_{k}
xk点处作Taylor级数展开,即
f
i
(
x
)
≈
f
i
(
x
k
)
+
∇
f
i
(
x
k
)
T
(
x
−
x
k
)
,
i
=
1
,
2
,
⋯
m
f_{i}(x) \approx f_{i}\left(x_{k}\right)+\nabla f_{i}\left(x_{k}\right)^{T}\left(x-x_{k}\right), \quad i=1,2, \cdots m
fi(x)≈fi(xk)+∇fi(xk)T(x−xk),i=1,2,⋯m
即有
{
f
1
(
x
)
≈
f
1
(
x
k
)
+
(
∂
f
1
(
x
k
)
∂
x
1
,
⋯
,
∂
f
1
(
x
k
)
∂
x
n
)
(
x
−
x
k
)
⋮
f
m
(
x
)
≈
f
m
(
x
k
)
+
(
∂
f
m
(
x
k
)
∂
x
1
,
⋯
,
∂
f
m
(
x
k
)
∂
x
n
)
(
x
−
x
k
)
\left\{\begin{array}{l}{f_{1}(x) \approx f_{1}\left(x_{k}\right)+\left(\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{1}}, \cdots, \frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{n}}\right)\left(x-x_{k}\right)} \\ {\vdots} \\ {f_{m}(x) \approx f_{m}\left(x_{k}\right)+\left(\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{1}}, \cdots, \frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{n}}\right)\left(x-x_{k}\right)}\end{array}\right.
⎩
⎨
⎧f1(x)≈f1(xk)+(∂x1∂f1(xk),⋯,∂xn∂f1(xk))(x−xk)⋮fm(x)≈fm(xk)+(∂x1∂fm(xk),⋯,∂xn∂fm(xk))(x−xk)
令
f
(
x
)
=
(
f
1
(
x
)
,
⋯
,
f
m
(
x
)
)
T
f(x)=\left(f_{1}(x), \cdots, f_{m}(x)\right)^{T}
f(x)=(f1(x),⋯,fm(x))T,
A
k
=
A
(
x
k
)
=
[
∂
f
1
(
x
k
)
∂
x
1
∂
f
1
(
x
k
)
∂
x
2
⋯
∂
f
1
(
x
k
)
∂
x
n
∂
f
2
∂
f
2
(
x
k
)
∂
x
2
⋯
∂
f
2
(
x
k
)
∂
x
n
⋮
⋮
⋮
⋮
∂
f
m
(
x
k
)
∂
x
1
∂
f
m
(
x
k
)
∂
x
2
⋯
∂
f
m
(
x
k
)
∂
x
n
]
A_{k}=A\left(x_{k}\right)=\left[\begin{array}{cccc}{\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{1}}} & {\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{n}}} \\ {\partial f_{2}} & {\frac{\partial f_{2}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{2}\left(x_{k}\right)}{\partial x_{n}}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{1}}} & {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{n}}}\end{array}\right]
Ak=A(xk)=
∂x1∂f1(xk)∂f2⋮∂x1∂fm(xk)∂x2∂f1(xk)∂x2∂f2(xk)⋮∂x2∂fm(xk)⋯⋯⋮⋯∂xn∂f1(xk)∂xn∂f2(xk)⋮∂xn∂fm(xk)
则上式可写成矩阵-向量形式
f
(
x
)
≈
f
(
x
k
)
+
A
k
(
x
−
x
k
)
f(x) \approx f(x_{k}) + A_{k}(x-x_{k})
f(x)≈f(xk)+Ak(x−xk)
称
A
k
A_{k}
Ak是向量值函数
f
(
x
)
=
(
f
1
(
x
)
,
⋅
⋅
⋅
,
f
m
(
x
)
)
T
f(x)=(f_{1}(x), ···, f_{m}(x))^{T}
f(x)=(f1(x),⋅⋅⋅,fm(x))T在点
x
k
x_{k}
xk处的Jacobi矩阵。从而求解:
m
i
n
(
f
k
+
A
k
(
x
−
x
k
)
)
T
(
f
k
+
A
k
(
x
−
x
k
)
min(f_{k}+A_{k}(x-x_{k}))^{T}(f_{k}+A_{k}(x-x_{k})
min(fk+Ak(x−xk))T(fk+Ak(x−xk)
将它的最优解作为下一个迭代点
x
k
+
1
x_{k+1}
xk+1。显然
min
(
f
k
+
A
k
(
x
−
x
k
)
)
T
(
f
k
+
A
k
(
x
−
x
k
)
)
=
min
(
A
k
x
−
(
A
k
x
k
−
f
k
)
)
T
(
A
k
x
−
(
A
k
x
k
−
f
k
)
)
\begin{array}{l}{\min \left(f_{k}+A_{k}\left(x-x_{k}\right)\right)^{T}\left(f_{k}+A_{k}\left(x-x_{k}\right)\right)} \\ {=\min \left(A_{k} x-\left(A_{k} x_{k}-f_{k}\right)\right)^{T}\left(A_{k} x-\left(A_{k} x_{k}-f_{k}\right)\right)}\end{array}
min(fk+Ak(x−xk))T(fk+Ak(x−xk))=min(Akx−(Akxk−fk))T(Akx−(Akxk−fk))
是线性最小二乘问题,可由上一段方法求解。因此
x
k
+
1
x_{k+1}
xk+1必满足方程
A
k
T
A
k
x
=
A
k
T
(
A
k
x
k
−
f
k
)
=
A
k
T
A
k
x
k
−
A
k
T
f
k
A_{k}^{T}A_{k}x=A_{k}^{T}(A_{k}x_{k}-f_{k})=A_{k}^{T}A_{k}x_{k}-A_{k}^{T}f_{k}
AkTAkx=AkT(Akxk−fk)=AkTAkxk−AkTfk
如果
A
k
T
A
k
A_{k}^{T}A_{k}
AkTAk是可逆的,则:
x
k
+
1
=
(
A
k
T
A
k
)
−
1
[
A
k
T
A
k
x
k
−
A
k
T
f
k
]
=
x
k
−
(
A
k
T
A
k
)
−
1
A
k
T
f
k
x_{k+1}=\left(A_{k}^{T} A_{k}\right)^{-1}\left[A_{k}^{T} A_{k} x_{k}-A_{k}^{T} f_{k}\right]=x_{k}-\left(A_{k}^{T} A_{k}\right)^{-1} A_{k}^{T} f_{k}
xk+1=(AkTAk)−1[AkTAkxk−AkTfk]=xk−(AkTAk)−1AkTfk
这相当于
x
k
+
1
=
x
k
+
p
k
x_{k+1}=x_{k}+p_{k}
xk+1=xk+pk搜索方向为
p
k
=
−
(
A
k
T
A
k
)
−
1
A
k
T
f
k
p_{k}=-(A_{k}^{T}A_{k})^{-1}A_{k}^{T}f_{k}
pk=−(AkTAk)−1AkTfk步长为1的过程。
称它为非线性最小二乘问题的Gauss-Newton迭代公式,而由这个公式所产生的算法称为Gauss-Newton法。当 f ( x ) f(x) f(x)满足一定的条件,并且 x 0 x_{0} x0充分靠近极小点 x ∗ x^{*} x∗时,Gauss-Newton法是收敛的。
需要指出的是,即使在每次迭代中 A k T A k A_{k}^{T}A_{k} AkTAk都是可逆的,也保证不了算法是下降算法。特别是当初始点 x 0 x_{0} x0远离极小点 x ∗ x^{*} x∗时,算法很可能发散。但是,当 A k T A k A_{k}^{T}A_{k} AkTAk可逆时,所确定的 p k p_{k} pk是目标函数在点 x k x_{k} xk处的下降方向。
我的微信公众号名称:小小何先生
公众号介绍:主要研究分享深度学习、机器博弈、强化学习等相关内容!期待您的关注,欢迎一起学习交流进步!