from deepreplay.callbacks import ReplayData
from deepreplay.replay import Replay
from deepreplay.plot import compose_plots
from keras.initializers import normal
from matplotlib import pyplot as plt
filename = 'part2_weight_initializers.h5'
group_name = 'sigmoid_stdev_0.01'
# Uses normal initializer
initializer = normal(mean=0, stddev=0.01, seed=13)
# Builds BLOCK model
model = build_model(n_layers=5, input_dim=10, units=100,
activation='sigmoid', initializer=initializer)
# Since we only need initial weights, we don't even need to train the model!
# We still use the ReplayData callback, but we can pass the model as argument instead
replaydata = ReplayData(X, y, filename=filename, group_name=group_name, model=model)
# Now we feed the data to the actual Replay object
# so we can build the visualizations
replay = Replay(replay_filename=filename, group_name=group_name)
# Using subplot2grid to assemble a complex figure...
fig = plt.figure(figsize=(12, 6))
ax_zvalues = plt.subplot2grid((2, 2), (0, 0))
ax_weights = plt.subplot2grid((2, 2), (0, 1))
ax_activations = plt.subplot2grid((2, 2), (1, 0))
ax_gradients = plt.subplot2grid((2, 2), (1, 1))
wv = replay.build_weights(ax_weights)
gv = replay.build_gradients(ax_gradients)
# Z-values
zv = replay.build_outputs(ax_zvalues, before_activation=True,
exclude_outputs=True, include_inputs=False)
# Activations
av = replay.build_outputs(ax_activations, exclude_outputs=True, include_inputs=False)
# Finally, we use compose_plots to update all
# visualizations at once
fig = compose_plots([zv, wv, av, gv],
epoch=0,
title=r'Activation: sigmoid - Initializer: Normal $\sigma = 0.01$')
Trying a different Activation Function
Xavier / Glorot Initialization Scheme
Rectified Linear Unit (ReLU) Activation Function
He Initialization Scheme
So, we need not only a similar variance along all the layers, but also a proper scale for the gradients. The scale is quite important, as it will, together with the learning rate, define how fast the weights are going to be updated. If the gradients are way too small, the learning (that is, the update of the weights) will be extremely slow.
Showdown — Normal vs Uniform and Glorot vs He!
To be honest, Glorot vs He actually means Tanh vs ReLU and we all know the answer to this match (spoiler alert!): ReLU wins!
And what about Normal vs Uniform? Uniform wins! Let’s check the plot below:
In summary
For a ReLU activated network, the He initialization scheme using an Uniform distribution is a pretty good choice 😉
https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404