Theoretical derivation
def _compute_fans(shape):
"""Computes the number of input and output units for a weight shape.
Args:
shape: Integer shape tuple or TF tensor shape.
Returns:
A tuple of scalars (fan_in, fan_out).
"""
if len(shape) < 1: # Just to avoid errors for constants.
fan_in = fan_out = 1
elif len(shape) == 1:
fan_in = fan_out = shape[0]
elif len(shape) == 2:
fan_in = shape[0]
fan_out = shape[1]
else:
# Assuming convolution kernels (2D, 3D, or more).
# kernel shape: (..., input_depth, depth)
receptive_field_size = 1.
for dim in shape[:-2]:
receptive_field_size *= dim
fan_in = shape[-2] * receptive_field_size
fan_out = shape[-1] * receptive_field_size
return fan_in, fan_out
- for fully connected layer
fan_in = shape[0]
fan_out = shape[1] - bias
fan_in = fan_out = shape[0] - conv2d kernel [filter_height, filter_width, in_channels, out_channels]
fan_in = (filter_height * filter_width) * in_channels
fan_out = (filter_height * filter_width) * out_channels
truncated_normal_initializer
These values are similar to values from a random_normal_initializer
except that values more than two standard deviations from the mean
are discarded and re-drawn.
import tensorflow as tf
import numpy as np
tf.set_random_seed(42)
weight_shape = (50, 30)
kernel_shape = (3, 3, 256, 36)
1. xavier initializer
This initializer is designed to keep the scale of the gradients roughly the
same in all layers.
- In uniform distribution this ends up being the range:
x