无约束优化的基础

1、梯度与黑塞矩阵

定义1:设   n   ~n~  n 元函数   f ( x )   ~f(x)~  f(x) 对自变量   x = ( x 1 , x 2 , … , x n ) T   ~x=(x_1,x_2,\dots,x_n)^T~  x=(x1,x2,,xn)T 各自分量   x i   ~x_i~  xi 的一阶偏导数为
∂ f ( x ) ∂ x i ,     i = 1 , 2 , … , n \frac{\partial f(x)}{\partial x_i},~~~i=1,2,\dots,n xif(x),   i=1,2,,n
那么称向量
∇ f ( x ) = ( ∂ f ( x ) ∂ x 1 , ∂ f ( x ) ∂ x 2 , … , ∂ f ( x ) ∂ x n ) T \nabla f(x)=(\frac{\partial f(x)}{\partial x_1},\frac{\partial f(x)}{\partial x_2},\dots,\frac{\partial f(x)}{\partial x_n})^T f(x)=(x1f(x),x2f(x),,xnf(x))T
为函数   f ( x )   ~f(x)~  f(x)    x   ~x~  x 处的一阶导数或梯度

定义2:设   n   ~n~  n 元函数   f ( x )   ~f(x)~  f(x) 对自变量   x = ( x 1 , x 2 , … , x n ) T   ~x=(x_1,x_2,\dots,x_n)^T~  x=(x1,x2,,xn)T 各自分量   x i   ~x_i~  xi 的二阶偏导数为
∂ 2 f ( x ) ∂ x i x j ,     i , j = 1 , 2 , … , n \frac{\partial^2 f(x)}{\partial x_i x_j},~~~i,j=1,2,\dots,n xixj2f(x),   i,j=1,2,,n
那么称矩阵
∇ 2 f ( x ) = ( ∂ 2 f ( x ) ∂ x 1 2 ∂ 2 f ( x ) ∂ x 1 x 2 … ∂ 2 f ( x ) ∂ x i x n ∂ 2 f ( x ) ∂ x 2 x 1 ∂ 2 f ( x ) ∂ x 2 2 … ∂ 2 f ( x ) ∂ x 2 x n ⋮ ⋮ ⋮ ∂ 2 f ( x ) ∂ x n x 1 ∂ 2 f ( x ) ∂ x n x 2 ⋯ ∂ 2 f ( x ) ∂ x n 2 ) \nabla^2 f(x)=\begin{pmatrix} \frac{\partial^2 f(x)}{\partial x_1^2 }&\frac{\partial^2 f(x)}{\partial x_1 x_2}&\dots&\frac{\partial^2 f(x)}{\partial x_i x_n}\\ \frac{\partial^2 f(x)}{\partial x_2 x_1}&\frac{\partial^2 f(x)}{\partial x_2^2 }&\dots&\frac{\partial^2 f(x)}{\partial x_2 x_n}\\ \vdots&\vdots&&\vdots\\ \frac{\partial^2 f(x)}{\partial x_n x_1 }&\frac{\partial^2 f(x)}{\partial x_n x_2 }&\cdots&\frac{\partial^2 f(x)}{\partial x_n ^2 } \end{pmatrix} 2f(x)=x122f(x)x2x12f(x)xnx12f(x)x1x22f(x)x222f(x)xnx22f(x)xixn2f(x)x2xn2f(x)xn22f(x)
为函数   f ( x )   ~f(x)~  f(x)    x   ~x~  x 处的二阶导数矩阵或   H e s s e n   ~Hessen~  Hessen 矩阵

定义3:如果   f ( x )   ~f(x)~  f(x) 梯度的所有分量函数在   x   ~x~  x 都连续,则称   f ( x )   ~f(x)~  f(x)    x   ~x~  x 连续可微;如果   f ( x )   ~f(x)~  f(x)    H e s s e n   ~Hessen~  Hessen 矩阵的各个分量函数都连续,则   f ( x )   ~f(x)~  f(x)    x   ~x~  x 二阶连续可微。

定义4:如果   f   ~f~  f 在开集   D   ~D~  D 上每一点都连续可微,则称   f   ~f~  f    D   ~D~  D 上一阶连续可微;如果如果   f   ~f~  f 在开集   D   ~D~  D 上每一点上二阶连续可微,则称   f   ~f~  f    D   ~D~  D 上二阶连续可微

:(1)、定义4中之所以选择开集   D   ~D~  D ,而不是闭集,是因为闭集的边界不可微
(2)、如果   f ( x )   ~f(x)~  f(x)    x   ~x~  x 二阶连续可微,则
∂ 2 f ( x ) ∂ x i x j = ∂ 2 f ( x ) ∂ x j x x \frac{\partial^2 f(x)}{\partial x_i x_j }=\frac{\partial^2 f(x)}{\partial x_j x_x } xixj2f(x)=xjxx2f(x)
即表明   ∇ 2 f ( x )   ~\nabla^2 f(x)~  2f(x) 是一个对称矩阵

例1:设 A ∈ R n x n A\in\mathbb{R}^{nxn} ARnxn,   b ∈ R n   ~b\in\mathbb{R}^n~  bRn ,求二次函数
f ( x ) = 1 2 x T A x + b T x f(x)=\frac{1}{2}x^TAx+b^Tx f(x)=21xTAx+bTx
  x   ~x~  x 的梯度和   H e s s e   ~Hesse~  Hesse 矩阵
:由于 f ( x ) = 1 2 ∑ i = 1 i = n ∑ j = 1 j = n a i j x i x j + ∑ i = 1 i = n b i x i \begin{aligned} f(x)&=\frac{1}{2}\sum_{i=1}^{i=n}\sum_{j=1}^{j=n}a_{ij}x_ix_j+\sum_{i=1}^{i=n}b_ix_i\\ \end{aligned} f(x)=21i=1i=nj=1j=naijxixj+i=1i=nbixi
  k = 1 , 2 , ⋯   , n   ~k=1,2,\cdots,n~  k=1,2,,n 
∂ f ( x ) ∂ x k = 1 2 ∑ j = 1 , j ≠ k j = n a k j x j + 1 2 ∑ i = 1 , i ≠ k i = n a i k x i + 2 a k k x k + b k = 1 2 ∑ j = 1 j = n a k j x j + 1 2 ∑ i = 1 i = n a i k x i + b k \begin{aligned} \frac{\partial f(x)}{\partial x_k}&=\frac{1}{2}\sum_{j=1,j\neq k}^{j=n}a_{kj}x_j+\frac{1}{2}\sum_{i=1,i\neq k}^{i=n}a_{ik}x_i+2a_{kk}x_k+b_k\\ &=\frac{1}{2}\sum_{j=1}^{j=n}a_{kj}x_j+\frac{1}{2}\sum_{i=1}^{i=n}a_{ik}x_i+b_k \end{aligned} xkf(x)=21j=1,j=kj=nakjxj+21i=1,i=ki=naikxi+2akkxk+bk=21j=1j=nakjxj+21i=1i=naikxi+bk
  ∂ f ( x ) ∂ x = 1 2 ( A + A T ) x + b   ~\frac{\partial f(x)}{\partial x}=\frac{1}{2}(A+A^T)x+b~  xf(x)=21(A+AT)x+b 
和上面的分析类似,我们可以证明   ∇ 2 f ( x ) = 1 2 ( A + A T )   ~\nabla^2f(x)=\frac{1}{2}(A+A^T)~  2f(x)=21(A+AT) 

2、方向导数

定义5:设   f : R n → R   ~f:\mathbb{R}^n\rightarrow\mathbb{R}~  f:RnR 在开集   D   ~D~  D 上连续可微,对于   x ∈ R n , d ∈ R n   ~x\in\mathbb{R}^n,d\in\mathbb{R}^n~  xRn,dRn ,则   f   ~f~  f 在点   x   ~x~  x 关于方向   d   ~d~  d 的方向导数定义为
∂ f ∂ d ( x ) = lim ⁡ θ → 0 f ( x + θ d ) − f ( x ) θ \frac{\partial f}{\partial d}(x)=\lim_{\theta\rightarrow0}\frac{f(x+\theta d)-f(x)}{\theta} df(x)=θ0limθf(x+θd)f(x)
上述定义的方向导数等于   ∇ f ( x ) T d   ~\nabla f(x)^Td~  f(x)Td ,其中   ∇ f ( x )   ~\nabla f(x)~  f(x) 表示   f   ~f~  f    x   ~x~  x 处的梯度,   d   ~d~  d 为方向.

:(1)、显然方向导数是偏导数的推广,偏导数刻画的函数沿着特定方向的微商,而方向导数是任意方向的微商
(2)、就是关于这里方向导数的定义,采用的我后面参考的几本书上其中的定义,不过我当时一看觉得有问题,我当时认为方向导数应该这样定义
∂ f ∂ d ( x ) = lim ⁡ θ → 0 f ( x + θ d ) − f ( x ) θ ∥ d ∥ \frac{\partial f}{\partial d}(x)=\lim_{\theta\rightarrow0}\frac{f(x+\theta d)-f(x)}{\theta \Vert d\Vert} df(x)=θ0limθdf(x+θd)f(x)
上面的范数我们就取欧式范数,或者原始的定义方向选取的是单位方向。后来在维基百科发现方向导数的定义,它认为两者都可以,仔细一想,才是我狭隘了。如果有人留意此贴,希望大家思考一下。

3、多元函数的泰勒公式

定义6:若   f ( x )   ~f(x)~  f(x)    D   ~D~  D 上一阶连续可微,对任何   x , x + d ∈ D   ~x,x+d\in D~  x,x+dD 则有
f ( x + d ) = f ( x ) + ∇ f ( x ) T d + o ( ∥ d ∥ )         麦 克 劳 林 余 项 f(x+d)=f(x)+\nabla f(x)^Td+o(\Vert d\Vert)~~~~~~~麦克劳林余项 f(x+d)=f(x)+f(x)Td+o(d)       
f ( x + d ) = f ( x ) + ∇ f ( x + t d ) T d ,    t ∈ ( 0 , 1 )     柯 西 余 项 f(x+d)=f(x)+\nabla f(x+td)^Td,~~t\in(0,1)~~~柯西余项 f(x+d)=f(x)+f(x+td)Td,  t(0,1)   西
f ( x + d ) = f ( x ) + ∫ 0 1 ∇ f ( x + t d ) T d d t      积 分 余 项 f(x+d)=f(x)+\int_{0}^{1}\nabla f(x+td)^Tddt~~~~积分余项 f(x+d)=f(x)+01f(x+td)Tddt    
定义7:若   f ( x )   ~f(x)~  f(x)    D   ~D~  D 上二阶连续可微,对任何   x , x + d ∈ D   ~x,x+d\in D~  x,x+dD 则有
f ( x + d ) = f ( x ) + ∇ f ( x ) T d + 1 2 d T ∇ 2 f ( x ) d + o ( ∥ d ∥ 2 )         麦 克 劳 林 余 项 f(x+d)=f(x)+\nabla f(x)^Td+\frac{1}{2}d^T\nabla^2f(x)d+o(\Vert d\Vert^2)~~~~~~~麦克劳林余项 f(x+d)=f(x)+f(x)Td+21dT2f(x)d+o(d2)       
f ( x + d ) = f ( x ) + ∇ f ( x ) T d + 1 2 d T ∇ 2 f ( x + t d ) d    t ∈ ( 0 , 1 )     柯 西 余 项 f(x+d)=f(x)+\nabla f(x)^Td+\frac{1}{2}d^T\nabla^2f(x+td)d~~t\in(0,1)~~~柯西余项 f(x+d)=f(x)+f(x)Td+21dT2f(x+td)d  t(0,1)   西
f ( x + d ) = f ( x ) + ∇ f ( x ) T d + ∫ 0 1 ( 1 − t ) [ d T ∇ 2 f ( x + t d ) d ] d t      积 分 余 项 f(x+d)=f(x)+\nabla f(x)^Td+\int_{0}^{1}(1-t)[d^T\nabla^2 f(x+td)d]dt~~~~积分余项 f(x+d)=f(x)+f(x)Td+01(1t)[dT2f(x+td)d]dt    
证明:因为这个不是很显然
我们利用一元函数的泰勒展开证明,令   ϕ ( t ) = f ( x + t d )   ~\phi(t)=f(x+td)~  ϕ(t)=f(x+td) 
  ϕ ′ ( t ) = ∇ f ( x + t d ) T d   ~\phi'(t)=\nabla f(x+td)^Td~  ϕ(t)=f(x+td)Td ,   ϕ ′ ′ ( t ) = d T ∇ 2 f ( x + t d ) d   ~\phi''(t)=d^T\nabla ^2f(x+td)d~  ϕ(t)=dT2f(x+td)d ,由   ϕ ( 1 ) − ϕ ( 0 ) = ∫ 0 1 ϕ ′ ( t ) d t   ~\phi(1)-\phi(0)=\int_{0}^{1}\phi'(t)dt~  ϕ(1)ϕ(0)=01ϕ(t)dt 
f ( x + d ) − f ( x ) = ∫ 0 1 [ ∇ f ( x + t d ) T d ] d t = − ∫ 0 1 [ ∇ f ( x + t d ) T d ] d ( 1 − t ) = ( t − 1 ) ∇ f ( x + t d ) T d ∣ 0 1 + ∫ 0 1 ( 1 − t ) d [ ∇ f ( x + t d ) T d ] = ∇ f ( x ) T d + ∫ 0 1 ( 1 − t ) [ d T ∇ f ( x + t d ) T d ] d t \begin{aligned} f(x+d)-f(x)&=\int_{0}^{1}[\nabla f(x+td)^Td]dt=-\int_{0}^{1}[\nabla f(x+td)^Td]d(1-t)\\ &=(t-1)\nabla f(x+td)^Td|_0^1+\int_0^1(1-t)d[\nabla f(x+td)^Td]\\ &=\nabla f(x)^Td+\int_0^1(1-t)[d^T\nabla f(x+td)^Td]dt \end{aligned} f(x+d)f(x)=01[f(x+td)Td]dt=01[f(x+td)Td]d(1t)=(t1)f(x+td)Td01+01(1t)d[f(x+td)Td]=f(x)Td+01(1t)[dTf(x+td)Td]dt

4、两个普通公式的证明

此处是我临时起意加上的,肯定很多书上也找不到,主要的是
定义8:若   f ( x )   ~f(x)~  f(x)    开 集 D   ∈ R n ~开集D~\in\mathbb{R}^n  D Rn上二阶连续可微,对任何   x , x + t d ∈ D   ~x,x+td\in D~  x,x+tdD 则有
d f ( x + t d ) d t = ∇ f ( x + t d ) T d \frac{d f(x+td)}{d t}=\nabla f(x+td)^Td dtdf(x+td)=f(x+td)Td
d 2 f ( x + t d ) d t 2 = d T ∇ 2 f ( x + t d ) d \frac{d^2 f(x+td)}{d t^2}=d^T\nabla^2 f(x+td)d dt2d2f(x+td)=dT2f(x+td)d
这个公式我们在上面的证明中用到,但是看起来却不是那么显然,我来证明一下:
证明: d f ( x + t d ) d t = d f ( x 1 + t d 1 , x 2 + t d 2 , ⋯   , x n + t d n ) d t = ∂ f ( x + t d ) ∂ ( x 1 + t d 1 ) d 1 + ∂ f ( x + t d ) ∂ ( x 2 + t d 2 ) d 2 + ⋯ + ∂ f ( x + t d ) ∂ ( x n + t d n ) d n = ( ∂ f ( x + t d ) ∂ ( x 1 + t d 1 ) , ∂ f ( x + t d ) ∂ ( x 2 + t d 2 ) , ⋯   , ∂ f ( x + t d ) ∂ ( x n + t d n ) ) ( d 1 d 2 ⋮ d n ) = ( ∂ f ( x + t d ) ∂ x 1 , ∂ f ( x + t d ) ∂ x 2 , ⋯   , ∂ f ( x + t d ) ∂ x n ) ( d 1 d 2 ⋮ d n ) = ∇ f ( x + t d ) T d \begin{aligned} \frac{d f(x+td)}{d t}&=\frac{df(x_1+td_1,x_2+td_2,\cdots,x_n+td_n)}{dt}\\ &=\frac{\partial f(x+td)}{\partial (x_1+td_1)}d_1+\frac{\partial f(x+td)}{\partial (x_2+td_2)}d_2+\cdots+\frac{\partial f(x+td)}{\partial (x_n+td_n)}d_n\\ &=(\frac{\partial f(x+td)}{\partial (x_1+td_1)},\frac{\partial f(x+td)}{\partial (x_2+td_2)},\cdots,\frac{\partial f(x+td)}{\partial (x_n+td_n)})\begin{pmatrix} d_1\\d_2\\\vdots\\d_n \end{pmatrix}\\ &=(\frac{\partial f(x+td)}{\partial x_1},\frac{\partial f(x+td)}{\partial x_2},\cdots,\frac{\partial f(x+td)}{\partial x_n})\begin{pmatrix} d_1\\d_2\\\vdots\\d_n \end{pmatrix}\\ &=\nabla f(x+td)^Td \end{aligned} dtdf(x+td)=dtdf(x1+td1,x2+td2,,xn+tdn)=(x1+td1)f(x+td)d1+(x2+td2)f(x+td)d2++(xn+tdn)f(x+td)dn=((x1+td1)f(x+td),(x2+td2)f(x+td),,(xn+tdn)f(x+td))d1d2dn=(x1f(x+td),x2f(x+td),,xnf(x+td))d1d2dn=f(x+td)Td
d 2 f ( x + t d ) d t 2 = d 2 f ( x 1 + t d 1 , x 2 + t d 2 , ⋯   , x n + t d n ) d t 2 = ∂ 2 f ( x + t d ) ∂ 2 ( x 1 + t d 1 ) d 1 2 + ∂ 2 f ( x + t d ) ∂ ( x 1 + t d 1 ) ∂ ( x 2 + t d 2 ) d 1 d 2 + ⋯ + ∂ 2 f ( x + t d ) ∂ ( x 1 + t d 1 ) ∂ ( x n + t d n ) d 1 d n + ∂ 2 f ( x + t d ) ∂ ( x 2 + t d 2 ) ∂ ( x 1 + t d 1 ) d 2 d 1 + ∂ 2 f ( x + t d ) ∂ 2 ( x 2 + t d 2 ) d 2 2 + ⋯ + ∂ 2 f ( x + t d ) ∂ ( x 2 + t d 2 ) ∂ ( x n + t d n ) d 2 d n                                ⋮ + ∂ 2 f ( x + t d ) ∂ ( x n + t d n ) ∂ ( x 1 + t d 1 ) d n d 1 + ∂ 2 f ( x + t d ) ∂ ( x n + t d n ) ∂ ( x 2 + t d 2 ) d n d 2 + ⋯ + ∂ 2 f ( x + t d ) ∂ 2 ( x n + t d n ) d 2 d n = ( d 1 d 2 ⋯ d n ) ( ∂ 2 f ( x + t d ) ∂ 2 x 1 ∂ 2 f ( x + t d ) ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ( x + t d ) ∂ x 1 ∂ x n ∂ 2 f ( x + t d ) ∂ x 2 ∂ x 1 ∂ 2 f ( x + t d ) ∂ 2 x 2 ⋯ ∂ 2 f ( x + t d ) ∂ x 2 ∂ x n ⋮ ⋮ ⋮ ∂ 2 f ( x + t d ) ∂ x n ∂ x 1 ∂ 2 f ( x + t d ) ∂ x n ∂ x 2 ⋯ ∂ 2 f ( x + t d ) ∂ 2 x n ) ( d 1 d 2 ⋮ d n ) = d T ∇ 2 f ( x + t d ) d \begin{aligned} \frac{d^2 f(x+td)}{d t^2}&=\frac{d^2f(x_1+td_1,x_2+td_2,\cdots,x_n+td_n)}{dt^2}\\ &=\frac{\partial^2 f(x+td)}{\partial^2(x_1+td_1)}d_1^2+\frac{\partial^2 f(x+td)}{\partial (x_1+td_1)\partial (x_2+td_2)}d_1d_2+\cdots+\frac{\partial^2 f(x+td)}{\partial (x_1+td_1)\partial (x_n+td_n)}d_1d_n\\ &+\frac{\partial^2 f(x+td)}{\partial(x_2+td_2)\partial(x_1+td_1)}d_2d_1+\frac{\partial^2f(x+td)}{\partial^2(x_2+td_2)}d_2^2+\cdots+\frac{\partial^2 f(x+td)}{\partial(x_2+td_2)\partial(x_n+td_n)}d_2d_n\\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\vdots\\ &+\frac{\partial^2 f(x+td)}{\partial(x_n+td_n)\partial(x_1+td_1)}d_nd_1+\frac{\partial^2 f(x+td)}{\partial(x_n+td_n)\partial(x_2+td_2)}d_nd_2+\cdots+\frac{\partial^2 f(x+td)}{\partial^2(x_n+td_n)}d_2d_n\\ &=\begin{pmatrix} d_1&d_2&\cdots&d_n \end{pmatrix}\begin{pmatrix} \frac{\partial^2 f(x+td)}{\partial^2x_1}&\frac{\partial^2 f(x+td)}{\partial x_1\partial x_2}\cdots&\frac{\partial^2 f(x+td)}{\partial x_1\partial x_n}\\ \frac{\partial^2 f(x+td)}{\partial x_2\partial x_1}&\frac{\partial^2 f(x+td)}{\partial^2x_2}\cdots&\frac{\partial^2 f(x+td)}{\partial x_2\partial x_n}\\ \vdots&\vdots&\vdots&\\ \frac{\partial^2 f(x+td)}{\partial x_n\partial x_1}&\frac{\partial^2 f(x+td)}{\partial x_n\partial x_2}\cdots&\frac{\partial^2 f(x+td)}{\partial^2x_n} \end{pmatrix}\begin{pmatrix} d_1\\d_2\\\vdots\\d_n \end{pmatrix}\\ &=d^T\nabla^2f(x+td)d \end{aligned} dt2d2f(x+td)=dt2d2f(x1+td1,x2+td2,,xn+tdn)=2(x1+td1)2f(x+td)d12+(x1+td1)(x2+td2)2f(x+td)d1d2++(x1+td1)(xn+tdn)2f(x+td)d1dn+(x2+td2)(x1+td1)2f(x+td)d2d1+2(x2+td2)2f(x+td)d22++(x2+td2)(xn+tdn)2f(x+td)d2dn                              +(xn+tdn)(x1+td1)2f(x+td)dnd1+(xn+tdn)(x2+td2)2f(x+td)dnd2++2(xn+tdn)2f(x+td)d2dn=(d1d2dn)2x12f(x+td)x2x12f(x+td)xnx12f(x+td)x1x22f(x+td)2x22f(x+td)xnx22f(x+td)x1xn2f(x+td)x2xn2f(x+td)2xn2f(x+td)d1d2dn=dT2f(x+td)d


此次内容参考书籍:
[1]、倪勤:最优化方法与程序设计
[2]、袁亚湘,孙文瑜:最优化理论与方法

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

多情剑客无情剑yu

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值