hyperparameter

本文探讨了在深度学习中不同激活函数与权重初始化方案的效果,如sigmoid、tanh与ReLU配合normal和uniform分布的实验。发现ReLU激活函数配合He初始化(特别是uniform分布)表现优秀,而Glorot初始化在tanh函数中表现出色。总结中推荐对于ReLU激活的网络使用He初始化的uniform分布。
摘要由CSDN通过智能技术生成
from deepreplay.callbacks import ReplayData
from deepreplay.replay import Replay
from deepreplay.plot import compose_plots
from keras.initializers import normal
from matplotlib import pyplot as plt

filename = 'part2_weight_initializers.h5'
group_name = 'sigmoid_stdev_0.01'

# Uses normal initializer
initializer = normal(mean=0, stddev=0.01, seed=13)

# Builds BLOCK model
model = build_model(n_layers=5, input_dim=10, units=100, 
                    activation='sigmoid', initializer=initializer)

# Since we only need initial weights, we don't even need to train the model! 
# We still use the ReplayData callback, but we can pass the model as argument instead
replaydata = ReplayData(X, y, filename=filename, group_name=group_name, model=model)

# Now we feed the data to the actual Replay object
# so we can build the visualizations
replay = Replay(replay_filename=filename, group_name=group_name)

# Using subplot2grid to assemble a complex figure...
fig = plt.figure(figsize=(12, 6))
ax_zvalues = plt.subplot2grid((2, 2), (0, 0))
ax_weights = plt.subplot2grid((2, 2), (0, 1))
ax_activations = plt.subplot2grid((2, 2), (1, 0))
ax_gradients = plt.subplot2grid((2, 2), (1, 1))

wv = replay.build_weights(ax_weights)
gv = replay.build_gradients(ax_gradients)
# Z-values
zv = replay.build_outputs(ax_zvalues, before_activation=True, 
                          exclude_outputs=True, include_inputs=False)
# Activations
av = replay.build_outputs(ax_activations, exclude_outputs=True, include_inputs=False)

# Finally, we use compose_plots to update all
# visualizations at once
fig = compose_plots([zv, wv, av, gv], 
                    epoch=0, 
                    title=r'Activation: sigmoid - Initializer: Normal $\sigma = 0.01$')

sigmoid + normal + stddev 0.01 不行 X

sigmoid + normal + stddev 0.1 X

sigmoid + normal + stddev 1 X

Trying a different Activation Function


tanh+ normal + stddev 0.01 X

tanh+ normal + stddev 1 X

tanh+ normal + stddev 0.1 可行

Xavier / Glorot Initialization Scheme


tanh+ Glorot normal 可行

tanh+ Glorot uniform 可行

Rectified Linear Unit (ReLU) Activation Function


relu+ Glorot normal X

relu+ Glorot normal X

He Initialization Scheme


relu+ He normal 可行

relu+ He Uniform 可行

So, we need not only a similar variance along all the layers, but also a proper scale for the gradients. The scale is quite important, as it will, together with the learning rate, define how fast the weights are going to be updated. If the gradients are way too small, the learning (that is, the update of the weights) will be extremely slow.

Showdown — Normal vs Uniform and Glorot vs He!

To be honest, Glorot vs He actually means Tanh vs ReLU and we all know the answer to this match (spoiler alert!): ReLU wins!

And what about Normal vs Uniform? Uniform wins! Let’s check the plot below:


Uniform wins

In summary

For a ReLU activated network, the He initialization scheme using an Uniform distribution is a pretty good choice 😉

https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404

https://zhuanlan.zhihu.com/p/38315135

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值