无约束最优化(五) 最小二乘法问题的解法

  在数据处理中,经常遇到寻求回归方程的问题,即根据一组实验数据,建立两个或多个物理量(舒称因素)之间的在统计意义上的依赖关系式。

引言

  最小二乘模型可以解决两类实际问题。

  第一类问题:在数据处理中经常遇到寻求回归方程的问题,即根据一组实验数据建立两个或多个物理量(俗称因素)之间的在统计意义上的依赖关系式。例如一个量 y y y与另一个或几个量 t 1 , ⋅ ⋅ ⋅ , t l t_{1},···,t_{l} t1,⋅⋅⋅tl有关系。这类问题的一般性描述如下。假定要建立量 y y y l l l个量 t 1 , ⋅ ⋅ ⋅ , t l t_{1},···,t_{l} t1,⋅⋅⋅tl之间的依赖关系式,设方程为:
y = F ( t 1 , t 2 , ⋅ ⋅ ⋅ , t l ; x 1 , x 2 , ⋅ ⋅ ⋅ , x n ) y = F(t_{1},t_{2},···,t_{l};x_{1},x_{2},···,x_{n}) y=F(t1,t2,⋅⋅⋅tl;x1,x2,⋅⋅⋅xn)
  其中 F F F的形式事先给定, x 1 , ⋅ ⋅ ⋅ , x n x_{1},···,x_{n} x1⋅⋅⋅xn是待定参数。有 m ( > n ) m( > n) m(>n)组实验数据:
[ t 1 ( i ) , t 2 ( i ) , ⋯   , t l ( i ) ; y ( i ) ] T , i = 1 , 2 , ⋯   , m \left[t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)} ; y^{(i)}\right]^{T}, \quad i=1,2, \cdots, m [t1(i),t2(i),,tl(i);y(i)]T,i=1,2,,m
  其中 t 1 , ⋅ ⋅ ⋅ , t l t_{1},···,t_{l} t1,⋅⋅⋅tl是试验中的已知数据,而 y y y是实验后得到的结果。 问题是,如何确定 n n n个参数 x 1 , x 2 , ⋅ ⋅ ⋅ , x n x_{1},x_{2},···,x_{n} x1,x2,⋅⋅⋅xn从而建立起回归方程。把第 i i i个实验点的自变量 t 1 ( i ) , t 2 ( i ) , ⋯   , t l ( i ) t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)} t1(i),t2(i),,tl(i)代入到其中就会得到相应的函数值。
y ~ ( i ) = F ( t 1 ( i ) , t 2 ( i ) , ⋯   , t l ( i ) ; x 1 , x 2 , ⋯   , x n ) \tilde{y}^{(i)}=F\left(t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{l}^{(i)} ; x_{1}, x_{2}, \cdots, x_{n}\right) y~(i)=F(t1(i),t2(i),,tl(i);x1,x2,,xn)
   y ~ ( i ) \tilde{y}^{(i)} y~(i)是真实值 y i y^{i} yi的假设值。当然希望它们之差的绝对的假设值。当然希望它们之差的绝对
min ⁡ ∑ i = 1 m [ F ( t 1 ( i ) , t 2 ( i ) , ⋯   , t j ( i ) ; x 1 , x 2 , ⋯   , x n ) − y ( i ) ] 2 \min \sum_{i=1}^{m}\left[F\left(t_{1}^{(i)}, t_{2}^{(i)}, \cdots, t_{j}^{(i)} ; x_{1}, x_{2}, \cdots, x_{n}\right)-y^{(i)}\right]^{2} mini=1m[F(t1(i),t2(i),,tj(i);x1,x2,,xn)y(i)]2
  是非常自然的事情。这就是最小二乘模型。令
f i ( x 1 , ⋯   , x n ) = F ( t 1 ( i ) , ⋯   , t i ( i ) ; x 1 , ⋯   , x n ) − y ( i ) f_{i}\left(x_{1}, \cdots, x_{n}\right)=F\left(t_{1}^{(i)}, \cdots, t_{i}^{(i)} ; x_{1}, \cdots, x_{n}\right)-y^{(i)} fi(x1,,xn)=F(t1(i),,ti(i);x1,,xn)y(i)
  则最小二乘模型变为
min ⁡ ∑ i = 1 m f i 2 ( x 1 , ⋯   , x n ) \min \sum_{i=1}^{m} f_{i}^{2}\left(x_{1}, \cdots, x_{n}\right) mini=1mfi2(x1,,xn)
  令 x = ( x 1 , x 2 , ⋅ ⋅ ⋅ x n ) T x=(x_{1},x_{2},···x_{n})^{T} x=(x1,x2,⋅⋅⋅xn)T f ( x ) = ( f 1 ( x ) , ⋯   , f m ( x ) ) T f(x)=\left(f_{1}(x), \cdots, f_{m}(x)\right)^{T} f(x)=(f1(x),,fm(x))T则又变成
min ⁡ ∑ i = 1 m f ( x ) T f ( x ) \min \sum_{i=1}^{m} f(x)^{T}f(x) mini=1mf(x)Tf(x)
  求它的最优解即称为求解最小二乘问题,将最优解 x = ( x 1 ∗ , x 2 ∗ , ⋅ ⋅ ⋅ x n ∗ ) T x=(x_{1}^{*},x_{2}^{*},···x_{n}^{*})^{T} x=(x1,x2,⋅⋅⋅xn)T代入到 y = F ( t 1 , t 2 , ⋅ ⋅ ⋅ , t l ; x 1 , x 2 , ⋅ ⋅ ⋅ , x n ) y = F(t_{1},t_{2},···,t_{l};x_{1},x_{2},···,x_{n}) y=F(t1,t2,⋅⋅⋅tl;x1,x2,⋅⋅⋅xn)中所得 y = F ( t 1 , t 2 , ⋅ ⋅ ⋅ , t l ; x 1 ∗ , x 2 ∗ , ⋅ ⋅ ⋅ x n ∗ ) y = F(t_{1},t_{2},···,t_{l};x_{1}^{*},x_{2}^{*},···x_{n}^{*}) y=F(t1,t2,⋅⋅⋅tl;x1,x2,⋅⋅⋅xn)即为回归方程。显然最小二乘问题是无约束规划,但由于其特殊结构,有其特殊的求解方法。

  当所有的 f i ( x 1 , ⋯   , x n ) = F ( t 1 ( i ) , ⋯   , t i ( i ) ; x 1 , ⋯   , x n ) − y ( i ) f_{i}\left(x_{1}, \cdots, x_{n}\right)=F\left(t_{1}^{(i)}, \cdots, t_{i}^{(i)} ; x_{1}, \cdots, x_{n}\right)-y^{(i)} fi(x1,,xn)=F(t1(i),,ti(i);x1,,xn)y(i)均是 x 1 , x 2 , ⋅ ⋅ ⋅ x n x_{1},x_{2},···x_{n} x1,x2,⋅⋅⋅xn的线性函数时,称为线性最小二乘问题,否则,称为非线性最小二乘问题。

第二类问题:求解方程组(数学问题)
f 1 ( x 1 , x 2 , ⋯   , x n ) = 0 f 2 ( x 1 , x 2 , ⋯   , x n ) = 0 ⋮ f m ( x 1 , x 2 , ⋯   , x n ) = 0 } \left.\begin{array}{l}{f_{1}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0} \\ {f_{2}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0} \\ {\vdots} \\ {f_{m}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0}\end{array}\right\} f1(x1,x2,,xn)=0f2(x1,x2,,xn)=0fm(x1,x2,,xn)=0
  如是线性方程组,则为 A x − b = 0 Ax-b=0 Axb=0,当 R ( A ) < R ( A , b ) R(A) < R(A,b) R(A)<R(A,b)时,无解,但确实需要求解它。《线性代数》无办法。如是非线性方程组,在数学上求迭代解也非常麻烦,为此求解
min ⁡ ∑ i = 1 m f i 2 ( x 1 , ⋯   , x n ) \min \sum_{i=1}^{m} f_{i}^{2}\left(x_{1}, \cdots, x_{n}\right) mini=1mfi2(x1,,xn)
  是非常自然的事情。它也是一个最小二乘问题。

最小二乘问题的解法

(1) 线性最小二乘问题

  当 f ( x ) f(x) f(x)取线性函数形式,即 f ( x ) = A x − b f(x)=Ax-b f(x)=Axb A m × n A_{m \times n} Am×n,则线性最小二乘问题为 m i n ( A x − b ) T ( A x − b ) min(Ax-b)^{T}(Ax-b) min(Axb)T(Axb)。其极小点的解 x ∗ x^{*} x有以下充要条件: A T A x ∗ = A T b A^{T} A x^{*}=A^{T} b ATAx=ATb

(2) 非线性最小二乘问题

  假定选定初始点 x 0 x_{0} x0后,经过 k k k 次迭代已求得 x k x_{k} xk。现在考虑 x k + 1 x_{k+1} xk+1的求法。与Newton法的基本思想相类似,把 f ( x ) f(x) f(x)线性化,用线性最小二乘问题的解去逼近非线性最小二乘问题的解。具体做法如下。

  把 f ( x ) f(x) f(x)的第 i i i个分量在 x k x_{k} xk点处作Taylor级数展开,即
f i ( x ) ≈ f i ( x k ) + ∇ f i ( x k ) T ( x − x k ) , i = 1 , 2 , ⋯ m f_{i}(x) \approx f_{i}\left(x_{k}\right)+\nabla f_{i}\left(x_{k}\right)^{T}\left(x-x_{k}\right), \quad i=1,2, \cdots m fi(x)fi(xk)+fi(xk)T(xxk),i=1,2,m
  即有
{ f 1 ( x ) ≈ f 1 ( x k ) + ( ∂ f 1 ( x k ) ∂ x 1 , ⋯   , ∂ f 1 ( x k ) ∂ x n ) ( x − x k ) ⋮ f m ( x ) ≈ f m ( x k ) + ( ∂ f m ( x k ) ∂ x 1 , ⋯   , ∂ f m ( x k ) ∂ x n ) ( x − x k ) \left\{\begin{array}{l}{f_{1}(x) \approx f_{1}\left(x_{k}\right)+\left(\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{1}}, \cdots, \frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{n}}\right)\left(x-x_{k}\right)} \\ {\vdots} \\ {f_{m}(x) \approx f_{m}\left(x_{k}\right)+\left(\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{1}}, \cdots, \frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{n}}\right)\left(x-x_{k}\right)}\end{array}\right. f1(x)f1(xk)+(x1f1(xk),,xnf1(xk))(xxk)fm(x)fm(xk)+(x1fm(xk),,xnfm(xk))(xxk)
  令 f ( x ) = ( f 1 ( x ) , ⋯   , f m ( x ) ) T f(x)=\left(f_{1}(x), \cdots, f_{m}(x)\right)^{T} f(x)=(f1(x),,fm(x))T
A k = A ( x k ) = [ ∂ f 1 ( x k ) ∂ x 1 ∂ f 1 ( x k ) ∂ x 2 ⋯ ∂ f 1 ( x k ) ∂ x n ∂ f 2 ∂ f 2 ( x k ) ∂ x 2 ⋯ ∂ f 2 ( x k ) ∂ x n ⋮ ⋮ ⋮ ⋮ ∂ f m ( x k ) ∂ x 1 ∂ f m ( x k ) ∂ x 2 ⋯ ∂ f m ( x k ) ∂ x n ] A_{k}=A\left(x_{k}\right)=\left[\begin{array}{cccc}{\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{1}}} & {\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{1}\left(x_{k}\right)}{\partial x_{n}}} \\ {\partial f_{2}} & {\frac{\partial f_{2}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{2}\left(x_{k}\right)}{\partial x_{n}}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{1}}} & {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{m}\left(x_{k}\right)}{\partial x_{n}}}\end{array}\right] Ak=A(xk)= x1f1(xk)f2x1fm(xk)x2f1(xk)x2f2(xk)x2fm(xk)xnf1(xk)xnf2(xk)xnfm(xk)
  则上式可写成矩阵-向量形式
f ( x ) ≈ f ( x k ) + A k ( x − x k ) f(x) \approx f(x_{k}) + A_{k}(x-x_{k}) f(x)f(xk)+Ak(xxk)
  称 A k A_{k} Ak是向量值函数 f ( x ) = ( f 1 ( x ) , ⋅ ⋅ ⋅ , f m ( x ) ) T f(x)=(f_{1}(x), ···, f_{m}(x))^{T} f(x)=(f1(x),⋅⋅⋅fm(x))T在点 x k x_{k} xk处的Jacobi矩阵。从而求解:
m i n ( f k + A k ( x − x k ) ) T ( f k + A k ( x − x k ) min(f_{k}+A_{k}(x-x_{k}))^{T}(f_{k}+A_{k}(x-x_{k}) min(fk+Ak(xxk))T(fk+Ak(xxk)
  将它的最优解作为下一个迭代点 x k + 1 x_{k+1} xk+1。显然
min ⁡ ( f k + A k ( x − x k ) ) T ( f k + A k ( x − x k ) ) = min ⁡ ( A k x − ( A k x k − f k ) ) T ( A k x − ( A k x k − f k ) ) \begin{array}{l}{\min \left(f_{k}+A_{k}\left(x-x_{k}\right)\right)^{T}\left(f_{k}+A_{k}\left(x-x_{k}\right)\right)} \\ {=\min \left(A_{k} x-\left(A_{k} x_{k}-f_{k}\right)\right)^{T}\left(A_{k} x-\left(A_{k} x_{k}-f_{k}\right)\right)}\end{array} min(fk+Ak(xxk))T(fk+Ak(xxk))=min(Akx(Akxkfk))T(Akx(Akxkfk))
  是线性最小二乘问题,可由上一段方法求解。因此 x k + 1 x_{k+1} xk+1必满足方程
A k T A k x = A k T ( A k x k − f k ) = A k T A k x k − A k T f k A_{k}^{T}A_{k}x=A_{k}^{T}(A_{k}x_{k}-f_{k})=A_{k}^{T}A_{k}x_{k}-A_{k}^{T}f_{k} AkTAkx=AkT(Akxkfk)=AkTAkxkAkTfk
  如果 A k T A k A_{k}^{T}A_{k} AkTAk是可逆的,则:
x k + 1 = ( A k T A k ) − 1 [ A k T A k x k − A k T f k ] = x k − ( A k T A k ) − 1 A k T f k x_{k+1}=\left(A_{k}^{T} A_{k}\right)^{-1}\left[A_{k}^{T} A_{k} x_{k}-A_{k}^{T} f_{k}\right]=x_{k}-\left(A_{k}^{T} A_{k}\right)^{-1} A_{k}^{T} f_{k} xk+1=(AkTAk)1[AkTAkxkAkTfk]=xk(AkTAk)1AkTfk
  这相当于 x k + 1 = x k + p k x_{k+1}=x_{k}+p_{k} xk+1=xk+pk搜索方向为 p k = − ( A k T A k ) − 1 A k T f k p_{k}=-(A_{k}^{T}A_{k})^{-1}A_{k}^{T}f_{k} pk=(AkTAk)1AkTfk步长为1的过程。

  称它为非线性最小二乘问题的Gauss-Newton迭代公式,而由这个公式所产生的算法称为Gauss-Newton法。当 f ( x ) f(x) f(x)满足一定的条件,并且 x 0 x_{0} x0充分靠近极小点 x ∗ x^{*} x时,Gauss-Newton法是收敛的。

  需要指出的是,即使在每次迭代中 A k T A k A_{k}^{T}A_{k} AkTAk都是可逆的,也保证不了算法是下降算法。特别是当初始点 x 0 x_{0} x0远离极小点 x ∗ x^{*} x时,算法很可能发散。但是,当 A k T A k A_{k}^{T}A_{k} AkTAk可逆时,所确定的 p k p_{k} pk是目标函数在点 x k x_{k} xk处的下降方向。

我的微信公众号名称:小小何先生
公众号介绍:主要研究分享深度学习、机器博弈、强化学习等相关内容!期待您的关注,欢迎一起学习交流进步!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值