其方法基于一阶泰勒展开式
Welcom
以下方法基于多变量的泰勒展开式,欢迎大家指出错误;同时是第一次写博客,若有不恰当之处,请联系我。
TEXT
设有多元向量X:
X
=
[
x
1
,
x
2
,
x
3
,
.
.
.
.
.
.
,
x
n
]
X=\left[ x_1,x_2,x_3,......,x_n \right]
X=[x1,x2,x3,......,xn],且有关系
f
f
f,使得
f
(
X
)
=
f
(
x
1
,
x
2
,
x
3
,
.
.
.
.
.
.
,
x
n
)
f(X) = f\left( x_1,x_2,x_3,......,x_n \right)
f(X)=f(x1,x2,x3,......,xn)
则有:
f
(
X
)
=
f
(
X
0
)
+
(
∇
f
(
X
)
)
T
⋅
(
X
−
X
0
)
+
o
(
(
X
−
X
0
)
2
)
f\left( X \right) =f\left( X_0 \right) +\left( \nabla f\left( X \right) \right) ^T\cdot \left( X-X_0 \right) +o\left( \left( X-X_0 \right) ^2 \right)
f(X)=f(X0)+(∇f(X))T⋅(X−X0)+o((X−X0)2)
上式中,当 X − X 0 X-X_0 X−X0比较小时,才有较高的精度(适用于凸函数)。
设 X − X 0 = α ⋅ u X-X_0=\alpha \cdot u X−X0=α⋅u, α \alpha α 表示学习率, u u u表示 X − X 0 X-X_0 X−X0的单位矢量。
因此,当
X
−
X
0
X-X_0
X−X0很小时,忽略二阶精度,展开式等价于:
f
(
X
)
−
f
(
X
0
)
=
(
∇
f
(
X
)
)
T
α
⋅
u
f\left( X \right) -f\left( X_0 \right) =\left( \nabla f\left( X \right) \right) ^T\alpha \cdot u
f(X)−f(X0)=(∇f(X))Tα⋅u
上式中,可以认为$ f(X_0) $表示上一瞬时损失函数的值
f
(
X
)
f(X)
f(X) 表示此瞬时的值。
我们期望有:
f
(
X
)
−
f
(
X
0
)
<
0
f\left( X \right) -f\left( X_0 \right) <0
f(X)−f(X0)<0
且满足:
max
(
f
(
X
)
−
f
(
X
0
)
)
\max \left( f\left( X \right) -f\left( X_0 \right) \right)
max(f(X)−f(X0))
忽略标量学习率
α
\alpha
α,则上式转化为求:
min
(
u
T
⋅
∇
f
(
X
)
)
\min \left( u^T\cdot \nabla f\left( X \right) \right)
min(uT⋅∇f(X))
在n维空间中,向量方向相反时,可得到最小值。
即:
u
T
⋅
∇
f
(
X
)
=
∣
u
∣
⋅
∣
∇
f
(
X
)
∣
cos
<
u
,
∇
f
(
X
)
>
u^T\cdot \nabla f\left( X \right) =|u|\cdot |\nabla f\left( X \right) |\cos <u,\nabla f\left( X \right) >
uT⋅∇f(X)=∣u∣⋅∣∇f(X)∣cos<u,∇f(X)>
当
cos
<
u
,
∇
f
(
X
)
=
π
\cos <u,\nabla f\left( X \right) =\pi
cos<u,∇f(X)=π时取得最小值,既有:
min ( f ( X ) − f ( X 0 ) ) = − α ⋅ ∣ u ∣ ⋅ ∣ ∇ f ( X ) ∣ = − α ⋅ ∣ ∇ f ( X ) ∣ \min \left( f\left( X \right) -f\left( X_0 \right) \right) =-\alpha \cdot |u|\cdot |\nabla f\left( X \right) | = -\alpha \cdot |\nabla f\left( X \right)| min(f(X)−f(X0))=−α⋅∣u∣⋅∣∇f(X)∣=−α⋅∣∇f(X)∣
所以得到:
u
=
−
∇
f
(
X
)
∣
∇
f
(
X
)
∣
u=-\frac{\nabla f\left( X \right)}{|\nabla f\left( X \right) |}
u=−∣∇f(X)∣∇f(X)
即得:
X
−
X
0
=
−
α
⋅
∇
f
(
X
)
∣
∇
f
(
X
)
∣
=
−
β
⋅
∇
f
(
X
)
X-X_0=-\alpha \cdot \frac{\nabla f\left( X \right)}{|\nabla f\left( X \right) |}=-\beta \cdot \nabla f\left( X \right)
X−X0=−α⋅∣∇f(X)∣∇f(X)=−β⋅∇f(X)
证毕
参考文献:
https://blog.csdn.net/qq_38262266/article/details/100750998?depth_1-utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-1&utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-1