4.3 基于梯度的优化方法
1.向量微积分
(31条消息) 向量微积分基础_文剑木然的专栏-CSDN博客_向量微积分
常用求导公式(基于分母布局,结果转置即为分子布局)
∂
A
x
∂
x
=
A
(1)
\frac{\partial \mathbf{A} \mathbf{x}}{\partial \mathbf{x}}=\mathbf{A}\tag{1}
∂x∂Ax=A(1)
∂ x ⊤ A ∂ x = A ⊤ (2) \frac{\partial \mathbf{x}^{\top} \mathbf{A}}{\partial \mathbf{x}}=\mathbf{A}^{\top}\tag{2} ∂x∂x⊤A=A⊤(2)
∂ x ⊤ x ∂ x = 2 x ⊤ (3) \frac{\partial \mathbf{x}^{\top} \mathbf{x}}{\partial \mathbf{x}}=2 \mathbf{x}^{\top}\tag{3} ∂x∂x⊤x=2x⊤(3)
∂ x ⊤ A x ∂ x = x ⊤ ( A + A ⊤ ) (4) \frac{\partial \mathbf{x}^{\top} \mathbf{A} \mathbf{x}}{\partial \mathbf{x}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right)\tag{4} ∂x∂x⊤Ax=x⊤(A+A⊤)(4)
∂ ( u + v ) ∂ x = ∂ u ∂ x + ∂ v ∂ x (5) \frac{\partial(\mathbf{u}+\mathbf{v})}{\partial \mathbf{x}}=\frac{\partial \mathbf{u}}{\partial \mathbf{x}}+\frac{\partial \mathbf{v}}{\partial \mathbf{x}}\tag{5} ∂x∂(u+v)=∂x∂u+∂x∂v(5)
∂ ( u ⋅ v ) ∂ x = ∂ u ⊤ v ∂ x = u ⊤ ∂ v ∂ x + v ⊤ ∂ u ∂ x (6) \frac{\partial(\mathbf{u} \cdot \mathbf{v})}{\partial \mathbf{x}}=\frac{\partial \mathbf{u}^{\top} \mathbf{v}}{\partial \mathbf{x}}=\mathbf{u}^{\top} \frac{\partial \mathbf{v}}{\partial \mathbf{x}}+\mathbf{v}^{\top} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\tag{6} ∂x∂(u⋅v)=∂x∂u⊤v=u⊤∂x∂v+v⊤∂x∂u(6)
∂ f ( u ) ∂ x = ∂ f ( u ) ∂ u ∂ u ∂ x (7) \frac{\partial \mathbf{f}(\mathbf{u})}{\partial \mathbf{x}}=\frac{\partial \mathbf{f}(\mathbf{u})}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\tag{7} ∂x∂f(u)=∂u∂f(u)∂x∂u(7)
2.方向导数
数学篇-方向导数(讲的很通俗易懂) - 知乎 (zhihu.com)
如果函数
f
(
x
,
y
)
f(x,y)
f(x,y) 在点
P
0
(
x
0
,
y
0
)
P_0(x_0,y_0)
P0(x0,y0)可微分,那么函数在该点沿任一方向
l
l
l 的方向导数存在,且有
∂
f
∂
l
∣
(
x
0
,
y
0
)
=
f
x
(
x
0
,
y
0
)
cos
α
+
f
y
(
x
0
,
y
0
)
cos
β
(8)
\left.\frac{\partial f}{\partial l}\right|_{\left(x_{0}, y_{0}\right)}=f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta\tag{8}
∂l∂f∣∣∣∣(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(8)
其中,
c
o
s
α
cos\alpha
cosα 和
c
o
s
β
cos\beta
cosβ的方向余弦.
证明: 由假设
f
(
x
,
y
)
f(x,y)
f(x,y) 在点
(
x
0
,
y
0
)
(x_0,y_0)
(x0,y0)可微分,故有
f
(
x
0
+
Δ
x
,
y
0
+
Δ
y
)
−
f
(
x
0
,
y
0
)
=
f
x
(
x
0
,
y
0
)
Δ
x
+
f
y
(
x
0
,
y
0
)
Δ
y
+
o
(
(
Δ
x
)
2
+
(
Δ
y
)
2
)
(9)
\begin{array}{c} f\left(x_{0}+\Delta x, y_{0}+\Delta y\right)-f\left(x_{0}, y_{0}\right) \\=f_{x}\left(x_{0}, y_{0}\right) \Delta x+f_{y}\left(x_{0}, y_{0}\right) \Delta y+o\left(\sqrt{(\Delta x)^{2}+(\Delta y)^{2}}\right) \end{array}\tag{9}
f(x0+Δx,y0+Δy)−f(x0,y0)=fx(x0,y0)Δx+fy(x0,y0)Δy+o((Δx)2+(Δy)2)(9)
但点
(
x
0
+
Δ
x
,
y
0
+
Δ
y
)
(x_0+\Delta x,y_0+\Delta y)
(x0+Δx,y0+Δy) 在以
(
x
0
,
y
0
)
(x_0,y_0)
(x0,y0) 为始点的射线
l
l
l 上时,应有
Δ x = t cos α , Δ y = t cos β ( Δ x ) 2 + ( Δ y ) 2 = t (10) \begin{array}{c} \Delta x=t \cos \alpha, \Delta y=t \cos \beta \\ \sqrt{(\Delta x)^{2}+(\Delta y)^{2}}=t \end{array}\tag{10} Δx=tcosα,Δy=tcosβ(Δx)2+(Δy)2=t(10)
所以
lim
t
→
0
+
f
(
x
0
+
t
cos
α
,
y
0
+
t
cos
β
)
−
f
(
x
0
,
y
0
)
t
=
f
x
(
x
0
,
y
0
)
cos
α
+
f
y
(
x
0
,
y
0
)
cos
β
(11)
\lim _{t \rightarrow 0^{+}} \frac{f\left(x_{0}+t \cos \alpha, y_{0}+t \cos \beta\right)-f\left(x_{0}, y_{0}\right)}{t} \\ =f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta \tag{11}
t→0+limtf(x0+tcosα,y0+tcosβ)−f(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(11)
这就证明了方向导数存在,且其值为
∂
f
∂
l
∣
(
x
0
,
y
0
)
=
f
x
(
x
0
,
y
0
)
cos
α
+
f
y
(
x
0
,
y
0
)
cos
β
(12)
\left.\frac{\partial f}{\partial l}\right|_{\left(x_{0}, y_{0}\right)}=f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta\tag{12}
∂l∂f∣∣∣∣(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(12)
用x表示多维向量,用u表示方向,用a表示t,即可得到
∂
∂
α
f
(
x
+
α
u
)
=
u
T
∇
x
f
(
x
)
=
f
x
(
x
0
,
y
0
)
c
o
s
α
+
f
y
(
x
0
,
y
0
)
c
o
s
β
(13)
\frac{\partial}{\partial \alpha}f(x+\alpha u) = u^T \nabla_xf(x) = f_x(x0,y0) cos\alpha+f_y(x0,y0) cos\beta\tag{13}
∂α∂f(x+αu)=uT∇xf(x)=fx(x0,y0)cosα+fy(x0,y0)cosβ(13)
(7)式第一个等号是花书上给出的,目前仍有疑惑,我的推导如下
令
t
=
x
+
α
u
(14)
令 t = x+\alpha u\tag{14}
令t=x+αu(14)
∂
∂
α
f
(
x
+
α
u
)
=
∂
f
(
t
)
∂
α
=
∂
f
(
t
)
∂
t
⋅
∂
t
∂
α
(15)
\frac{\partial}{\partial \alpha} f(x+\alpha u)=\frac{\partial f(t)}{\partial \alpha}=\frac{\partial f(t)}{\partial t} \cdot \frac{\partial t}{\partial \alpha}\tag{15}
∂α∂f(x+αu)=∂α∂f(t)=∂t∂f(t)⋅∂α∂t(15)
=
∂
f
(
t
)
∂
t
⋅
∂
x
+
α
u
∂
α
(16)
=\frac{\partial f(t)}{\partial t} \cdot \frac{\partial x+\alpha u}{\partial \alpha}\tag{16}
=∂t∂f(t)⋅∂α∂x+αu(16)
=
∂
f
(
t
)
∂
t
⋅
u
(17)
=\frac{\partial f(t)}{\partial t} \cdot u\tag{17}
=∂t∂f(t)⋅u(17)
=
∇
x
f
(
x
)
T
⋅
u
(
取
α
=
0
)
(18)
=\nabla_{x} f(x)^T \cdot u (取\alpha=0)\tag{18}
=∇xf(x)T⋅u(取α=0)(18)
希望有明白的人指出我的问题
补充:问题解决了,由于我的公式是基于分母布局(横向),所以在(17)-(18)的时候得到的式子应该是 ∇ x f ( x ) T ⋅ u \nabla_{x} f(x)^T \cdot u ∇xf(x)T⋅u(怕误导大家,上面已修改,但是之前是没加转置的),又因为花书中的推导都是基于分子布局的,所以最终结果与我的结果会刚好差一个转置。另外,书中所有的向量都是列向量,尤其是梯度向量。
3.梯度下降
一阶优化方法
略
4.牛顿法
二阶优化方法
参考链接:数值优化(Numerical Optimization)(3)-牛顿法 - 知乎 (zhihu.com)
f
(
x
)
≈
f
(
x
k
)
+
(
x
−
x
k
)
T
∇
f
(
x
k
)
+
1
2
(
x
−
x
k
)
T
H
(
x
−
x
k
)
(19)
f(\boldsymbol x) \approx f(\boldsymbol{x_{k}})+\left(\boldsymbol x-\boldsymbol{x_{k}}\right)^T\nabla f\left(\boldsymbol{x_{k}}\right)+\frac{1}{2}\left(\boldsymbol x-\boldsymbol{x_{k}}\right)^{T} H\left(\boldsymbol x-\boldsymbol{x_{k}}\right)\tag{19}
f(x)≈f(xk)+(x−xk)T∇f(xk)+21(x−xk)TH(x−xk)(19)
要找到
f
(
x
)
f(x)
f(x)的最小点,对
f
f
f求导,得
f
′
(
x
)
=
∇
f
(
x
k
)
T
+
1
2
(
x
−
x
k
)
T
(
H
+
H
T
)
=
∇
f
(
x
k
)
T
+
(
x
−
x
k
)
T
H
(20)
f'(\boldsymbol x) =\nabla f(\boldsymbol{x_k})^T +\frac{1}{2}(\boldsymbol{x-x_k})^T(H+H^T) \\= \nabla f(\boldsymbol{x_k})^T +(\boldsymbol x-\boldsymbol{x_{k}})^TH\tag{20}
f′(x)=∇f(xk)T+21(x−xk)T(H+HT)=∇f(xk)T+(x−xk)TH(20)
令
f
′
(
x
)
=
0
f'(\boldsymbol x)=\boldsymbol0
f′(x)=0,又
H
H
H为对称矩阵,即
H
=
H
T
H = H^T
H=HT
x
−
x
k
=
−
(
H
−
1
∗
∇
f
(
x
k
)
T
)
T
=
−
H
−
1
∗
∇
f
(
x
k
)
(21)
\boldsymbol x-\boldsymbol{x_{k}} =- (H^{-1}*\nabla f(\boldsymbol{x_k})^T)^T =- H^{-1}*\nabla f(\boldsymbol{x_k})\tag{21}
x−xk=−(H−1∗∇f(xk)T)T=−H−1∗∇f(xk)(21)
x
k
+
1
=
x
k
−
H
−
1
∇
f
(
x
k
)
(22)
\boldsymbol{x_{k+1}} = \boldsymbol{x_k}-H^{-1}\nabla f(\boldsymbol{x_k})\tag{22}
xk+1=xk−H−1∇f(xk)(22)
当 f f f是一个正定二次函数时,牛顿法只要应用一次(22)就能跳到函数最小点
如果 f f f不是真正二次,但能在局部近似为正定二次,牛顿法则需要多次迭代
当附近的临界点是最小点牛顿法才适用,在鞍点附近是有害的
上面这些话我还需要好好琢磨琢磨