学习神经网络的反向传播过程中,涉及到对sigmoid函数进行求导、出来混总要还的。。丢掉的高数还是得捡起来,记录一下sigmoid函数的推导过程吧。
前置准备
- 求导的倒数法则:若有 g ( x ) = 1 f ( x ) g(x)=\frac{1}{f(x)} g(x)=f(x)1,则 g ′ ( x ) = − f ′ ( x ) f ( x ) 2 g'(x)=-\frac{f'(x)}{f(x)^2} g′(x)=−f(x)2f′(x)
- f ( x ) = e x , f ′ ( x ) = e x f(x) = e^x,f'(x) = e^x f(x)=ex,f′(x)=ex
证明1:
g ′ ( x ) = ( 1 f ( x ) ) ′ = lim Δ x → 0 1 f ( x + Δ x ) − 1 f ( x ) Δ x = lim Δ x → 0 f ( x + Δ x ) − f ( x ) f ( x + Δ x ) f ( x ) Δ x = ( lim Δ x → 0 f ( x + Δ x ) − f ( x ) Δ x ) ( lim Δ x → 0 1 f ( x + Δ x ) f ( x ) ) g'(x) = (\frac{1}{f(x)})'=\lim_{\Delta x\rightarrow0}\frac{\frac{1}{f(x+\Delta x)}-\frac{1}{f(x)}}{\Delta x}=\lim_{\Delta x\rightarrow0}\frac{f(x+\Delta x)-f(x)}{f(x+\Delta x)f(x)\Delta x}=(\lim_{\Delta x\rightarrow0}\frac{f(x+\Delta x) - f(x)}{\Delta x})(\lim_{\Delta x\rightarrow0}\frac{1}{f(x+\Delta x)f(x)}) g′(x)=(f(x)1)′=limΔx→0Δxf(x+Δx)1−f(x)1=limΔx→0f(x+Δx)f(x)Δxf(x+Δx)−f(x)=(limΔx→0Δxf(x+Δx)−f(x))(limΔx→0f(x+Δx)f(x)1)
由于 x + Δ x x+\Delta x x+Δx 在 点 x x x处连续当 Δ x → 0 \Delta x \rightarrow 0 Δx→0的时候、 f ( x + Δ x ) = f ( x ) f(x+\Delta x) = f(x) f(x+Δx)=f(x),有:
1 f ( x + Δ x ) 1 f ( x ) = 1 f ( x ) 2 \frac{1}{f(x+\Delta x)}\frac{1}{f(x)}=\frac{1}{f(x)^2} f(x+Δx)1f(x)1=f(x)21
即:
g ′ ( x ) = ( 1 f ( x ) ) ′ = ( lim Δ x → 0 − f ( x + Δ x ) − f ( x ) Δ x ) ( lim Δ x → 0 1 f ( x + Δ x ) f ( x ) ) = − f ′ ( x ) f ( x ) 2 g'(x) = (\frac{1}{f(x)})'=(\lim_{\Delta x\rightarrow0}-\frac{f(x+\Delta x) - f(x)}{\Delta x})(\lim_{\Delta x\rightarrow0}\frac{1}{f(x+\Delta x)f(x)})=-\frac{f'(x)}{f(x)^2} g′(x)=(f(x)1)′=(limΔx→0−Δxf(x+Δx)−f(x))(limΔx→0f(x+Δx)f(x)1)=−f(x)2f′(x)
证明2:
OTL、太难了,推导比较复杂,直接贴大佬的推导过程了,讲的很好的。
参考文章:知乎:为什么e^x 的导数是还是其自身?
好了、开始推导sigmoid函数求导公式
根据公式:
S
(
x
)
=
1
1
+
e
−
x
S(x)=\frac{1}{1+e^{-x}}
S(x)=1+e−x1有
S
′
(
x
)
=
(
1
1
+
e
−
x
)
′
=
−
(
1
+
e
−
x
)
′
(
1
+
e
−
x
)
2
=
1
1
+
e
−
x
1
+
e
−
x
−
1
1
+
e
−
x
=
1
1
+
e
−
x
(
1
−
1
1
+
e
−
x
)
=
S
(
x
)
(
1
−
S
(
x
)
)
S'(x)=(\frac{1}{1+e^{-x}})'=-\frac{(1+e^{-x})'}{(1+e^{-x})^2}=\frac{1}{1+e^{-x}}\frac{1+e^{-x}-1}{1+e^{-x}}=\frac{1}{1+e^{-x}}(1-\frac{1}{1+e^{-x}})=S(x)(1-S(x))
S′(x)=(1+e−x1)′=−(1+e−x)2(1+e−x)′=1+e−x11+e−x1+e−x−1=1+e−x1(1−1+e−x1)=S(x)(1−S(x))