更多深度学习资讯都在公众号:深度学习视觉
博客网址:
概述:
ReLU函数的计算是在卷积之后进行的,因此它与tanh函数和sigmoid函数一样,同属于非线性激活函数。ReLU函数的倒数在正数部分是恒等于1的,因此在深度网络中使用relu激活函数就不会导致梯度小时和爆炸的问题。并且,ReLU函数计算速度快,加快了网络的训练。不过,如果梯度过大,导致很多负数,由于负数部分值为0,这些神经元将无法激活(可通过设置较小学习率来解决)。
1.ReLu:
Re
L
U
(
x
)
=
max
(
x
,
0
)
=
{
0
,
x
<
0
x
,
x
>
0
}
\operatorname{Re} \mathrm{LU}(\mathrm{x})=\max (\mathrm{x}, 0)=\left\{\begin{array}{l}{0, x<0} \\ {x, x>0}\end{array}\right\}
ReLU(x)=max(x,0)={0,x<0x,x>0}
后向推导过程:设
l
l
l层输出为
z
l
z^l
zl,经过激活函数后的输出为
z
l
+
1
z^{l+1}
zl+1;记损失函数L关于第
l
l
l层的输出
z
l
z^l
zl的偏导为
δ
l
=
∂
L
∂
z
l
\delta^{l}=\frac{\partial L}{\partial z^{l}}
δl=∂zl∂L,则损失函数L关于第
l
l
l层的偏导为:
δ
l
=
∂
L
∂
z
l
+
1
∂
z
l
+
1
∂
z
l
\delta^{l}=\frac{\partial L}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial z^{l}}
δl=∂zl+1∂L∂zl∂zl+1
=
δ
l
+
1
∂
Re
L
U
(
z
l
)
∂
z
l
=\delta^{l+1} \frac{\partial \operatorname{Re} L U\left(z^{l}\right)}{\partial z^{l}}
=δl+1∂zl∂ReLU(zl)
=
δ
l
+
1
{
1
z
l
>
0
0
z
l
<
=
0
=\delta^{l+1}\left\{\begin{array}{ll}{1} & {z^{l}>0} \\ {0} & {z^{l}<=0}\end{array}\right.
=δl+1{10zl>0zl<=0
2.LeakReLU:
LeakReLU
(
L
e
a
k
R
e
L
U
(
z
)
)
=
{
z
z
>
0
α
z
z
<
=
0
,
α
=
0.1
(LeakReLU(z))=\left\{\begin{array}{ll}{z} & {z>0} \\ {\alpha z} & {z<=0, \alpha=0.1}\end{array}\right.
(LeakReLU(z))={zαzz>0z<=0,α=0.1
在负数部分给予一个小的梯度。由Relu可知损失函数L关于第
l
l
l层的偏导为:
δ
l
=
{
δ
l
+
1
z
l
>
0
α
δ
l
+
1
z
l
<
=
0
,
α
=
0.1
\delta^{l}=\left\{\begin{array}{ll}{\delta^{l+1}} & {z^{l}>0} \\ {\alpha \delta^{l+1}} & {z^{l}<=0, \alpha=0.1}\end{array}\right.
δl={δl+1αδl+1zl>0zl<=0,α=0.1
3.PReLU:
表达式与LeakReLu相同,只不过
α
\alpha
α可以学习。损失函数L关于参数
α
\alpha
α的偏导为:
∂
L
∂
α
=
∂
L
∂
z
l
+
1
∂
z
l
+
1
∂
α
\frac{\partial L}{\partial \alpha}=\frac{\partial L}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial \alpha}
∂α∂L=∂zl+1∂L∂α∂zl+1
δ
l
=
δ
l
+
1
∂
P
R
e
L
U
(
z
l
)
∂
α
\delta^{l}=\delta^{l+1} \frac{\partial P R e L U\left(z^{l}\right)}{\partial \alpha}
δl=δl+1∂α∂PReLU(zl)
δ
l
=
δ
l
+
1
{
0
z
l
>
0
z
l
z
l
<
=
0
\delta^{l}=\delta^{l+1}\left\{\begin{array}{ll}{0} & {z^{l}>0} \\ {z^{l}} & {z^{l}<=0}\end{array}\right.
δl=δl+1{0zlzl>0zl<=0
δ
l
=
{
0
z
l
>
0
δ
l
+
1
z
l
z
l
<
=
0
\delta^{l}=\left\{\begin{array}{ll}{0} & {z^{l}>0} \\ {\delta^{l+1} z^{l}} & {z^{l}<=0}\end{array}\right.
δl={0δl+1zlzl>0zl<=0
4.ELU:
f
(
z
)
=
{
z
z
>
0
α
(
exp
(
z
)
−
1
)
z
≤
0
f(z)=\left\{\begin{array}{ll}{z} & {z>0} \\ {\alpha(\exp (z)-1)} & {z \leq 0}\end{array}\right.
f(z)={zα(exp(z)−1)z>0z≤0
由LeakRelu可知损失函数L关于第
l
l
l层的偏导为:
δ
l
=
{
δ
l
+
1
z
l
>
0
α
δ
l
+
1
exp
(
z
l
)
z
l
<
=
0
\delta^{l}=\left\{\begin{array}{ll}{\delta^{l+1}} & {z^{l}>0} \\ {\alpha \delta^{l+1} \exp \left(z^{l}\right)} & {z^{l}<=0}\end{array}\right.
δl={δl+1αδl+1exp(zl)zl>0zl<=0
5.SELU:
SELU
(
z
)
=
λ
{
z
z
>
0
α
(
exp
(
z
)
−
1
)
z
<
=
0
\operatorname{SELU}(z)=\lambda\left\{\begin{array}{ll}{z} & {z>0} \\ {\alpha(\exp (z)-1)} & {z<=0}\end{array}\right.
SELU(z)=λ{zα(exp(z)−1)z>0z<=0
由ELU可知损失函数L关于第
l
l
l层的偏导为:
δ l = λ { δ l + 1 z l > 0 α δ l + 1 exp ( z l ) z l < = 0 \delta^{l}=\lambda\left\{\begin{array}{ll}{\delta^{l+1}} & {z^{l}>0} \\ {\alpha \delta^{l+1} \exp \left(z^{l}\right)} & {z^{l}<=0}\end{array}\right. δl=λ{δl+1αδl+1exp(zl)zl>0zl<=0
总结: 当激活值的均值非0时,就会对下一层造成一个bias,如果激活值之间不会相互抵消(即均值非0),会导致下一层的激活单元有bias shift。如此叠加,单元越多时,bias shift就会越大。除了ReLU,其它激活函数都将输出的平均值接近0,从而加快模型收敛,类似于Batch Normalization的效果,但是计算复杂度更低。虽然LeakReLU和PReLU都也有负值,但是它们不保证在不激活状态下(就是在输入为负的状态下)对噪声鲁棒。反观ELU在输入取较小值时具有软饱和的特性,提升了对噪声的鲁棒性。
更多内容请关注微信公众号:深度学习视觉
题图代码:
## 公众号:深度学习视觉
## Author:Fain
## Blog:Fainke.com
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
%matplotlib inline
sns.set(style="darkgrid")
fig = plt.figure(figsize=(12,6))
plt.xlim([-10, 10]);
plt.ylim([-1, 1.6]);
# 定义数值
x = np.sort(np.linspace(-10,10,1000))
# ReLu 函数
relu= [max(item,0) for item in x]
# LeakReLu函数
alpha = 0.1
leakRelu = [item if item>0 else item*alpha for item in x]
# PReLu函数
alpha = 0.1 # 可以学习的参数
leakRelu = [item if item>0 else item*alpha for item in x]
# ELU函数
alpha = 0.2
elu = [item if item>0 else (np.exp(item)-1)*alpha for item in x]
# SELU函数
alpha = 1
r = 0.5
selu = [item if item>0 else (np.exp(item)-1)*alpha for item in x]
selu = list(map(lambda x:x*r,selu))
# 绘图
plt.plot(x,relu,color="#ff0000", label = r"ReLu", marker='*')
plt.plot(x,leakRelu,color="#0000ff", label = r"LeakReLu")
plt.plot(x,elu,color="#00ff00", label = r"ELU")
plt.plot(x,selu,color="#00ffee", label = r"SELU")
plt.legend(prop={'family' : 'Times New Roman', 'size' : 16})
plt.show()