数学公式推导_梯度_gradient
核心
θ
t
+
1
=
θ
t
−
α
t
∇
f
(
θ
t
)
(1)
\theta _{t+1}=\theta _t-\alpha _t\nabla f\left( \theta _t \right) \tag{1}
θt+1=θt−αt∇f(θt)(1)
Example
Function:
J
(
θ
1
,
θ
2
)
=
θ
1
2
+
θ
1
2
(2)
J\left( \theta _1,\theta _2 \right) =\theta _{1}^{2}+\theta _{1}^{2}\tag{2}
J(θ1,θ2)=θ12+θ12(2)
Objective:
min
θ
1
,
θ
2
J
(
θ
1
,
θ
2
)
(3)
\underset{\theta _1,\theta _2}{\min}J\left( \theta _1,\theta _2 \right) \tag{3}
θ1,θ2minJ(θ1,θ2)(3)
Update rules:
θ
1
:
=
θ
1
−
α
d
d
θ
1
J
(
θ
1
,
θ
2
)
(4)
\theta _1:=\theta _1-\alpha \frac{d}{d\theta _1}J\left( \theta _1,\theta _2 \right) \tag{4}
θ1:=θ1−αdθ1dJ(θ1,θ2)(4)
θ
2
:
=
θ
2
−
α
d
d
θ
2
J
(
θ
1
,
θ
2
)
(5)
\theta _2:=\theta _2-\alpha \frac{d}{d\theta _2}J\left( \theta _1,\theta _2 \right) \tag{5}
θ2:=θ2−αdθ2dJ(θ1,θ2)(5)
Derivatives:
d
d
θ
1
J
(
θ
1
,
θ
2
)
=
d
d
θ
1
θ
1
2
+
d
d
θ
1
θ
2
2
=
2
θ
1
(6)
\frac{d}{d\theta _1}J\left( \theta _1,\theta _2 \right) =\frac{d}{d\theta _1}\theta _{1}^{2}+\frac{d}{d\theta _1}\theta _{2}^{2}=2\theta _1\tag{6}
dθ1dJ(θ1,θ2)=dθ1dθ12+dθ1dθ22=2θ1(6)
d d θ 2 J ( θ 1 , θ 2 ) = d d θ 2 θ 1 2 + d d θ 2 θ 2 2 = 2 θ 2 (7) \frac{d}{d\theta _2}J\left( \theta _1,\theta _2 \right) =\frac{d}{d\theta _2}\theta _{1}^{2}+\frac{d}{d\theta _2}\theta _{2}^{2}=2\theta _2\tag{7} dθ2dJ(θ1,θ2)=dθ2dθ12+dθ2dθ22=2θ2(7)
常见函数的梯度
一次
x
w
+
b
xw+b
xw+b
∇
(
w
,
b
)
=
(
x
,
1
)
\nabla \left( w,b \right) =\left( x,1 \right)
∇(w,b)=(x,1)
二次
x
w
2
+
b
2
xw^2+b^2
xw2+b2
∇ = ( 2 w x , 2 b ) \nabla =\left( 2wx,2b \right) ∇=(2wx,2b)
指数
x
e
w
+
e
b
xe^w+e^b
xew+eb
∇ = ( x e w , e b ) \nabla =\left( xe^w,e^b \right) ∇=(xew,eb)
复合
[
y
−
(
w
x
+
b
)
]
2
\left[ y-\left( wx+b \right) \right] ^2
[y−(wx+b)]2
∇ = ( 2 x ( y − ( w x + b ) ) , 2 ( y − ( x w + b ) ) ) \nabla =\left( 2x\left( y-\left( wx+b \right) \right) ,2\left( y-\left( xw+b \right) \right) \right) ∇=(2x(y−(wx+b)),2(y−(xw+b)))
log
y
log
(
w
x
+
b
)
y\log \left( wx+b \right)
ylog(wx+b)
∇ = ( y w x + b x , y w x + b ) \nabla =\left( \frac{y}{wx+b}x,\frac{y}{wx+b} \right) ∇=(wx+byx,wx+by)