一、网络类型
%matplotlib inline
import tensorflow as tf
import helper
from tensorflow.examples.tutorials.mnist import input_data
print('Getting MNIST Dataset...')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print('Data Extracted.')
# Save the shapes of weights for each layer
layer_1_weight_shape = (mnist.train.images.shape[1], 256)
layer_2_weight_shape = (256, 128)
layer_3_weight_shape = (128, mnist.train.labels.shape[1])
二、设置权重的办法
1、全0或者全1
由于所有值都一样,反向传播时权重很难更新。梯度也很难更新。
all_zero_weights = [
tf.Variable(tf.zeros(layer_1_weight_shape)),
tf.Variable(tf.zeros(layer_2_weight_shape)),
tf.Variable(tf.zeros(layer_3_weight_shape))
]
all_one_weights = [
tf.Variable(tf.ones(layer_1_weight_shape)),
tf.Variable(tf.ones(layer_2_weight_shape)),
tf.Variable(tf.ones(layer_3_weight_shape))
]
helper.compare_init_weights(
mnist,
'All Zeros vs All Ones',
[
(all_zero_weights, 'All Zeros'),
(all_one_weights, 'All Ones')])
如下图可以看出,两个模型效果都很差。
After 858 Batches (2 Epochs):
Validation Accuracy
11.260% -- All Zeros
9.900% -- All Ones
Loss
2.300 -- All Zeros
372.644 -- All Ones
2、均匀分布
helper.hist_dist('Random Uniform (minval=-3, maxval=3)', tf.random_uniform([1000], -3, 3))
tf.random_uniform(shape, minval=0, maxval=None, dtype=tf.float32, seed=None, name=None)
- shape: 输出个数
- minval: 随机值范围的下界限。默认为0。
- maxval: 随机值范围的上界限。默认为1。
- dtype: 输出类型: float32, float64, int32, or int64.
- seed: 产生随机分布的种子
- name:
(1)权重设置范围[0,1]
# Default for tf.random_uniform is minval=0 and maxval=1
basline_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape)),
tf.Variable(tf.random_uniform(layer_2_weight_shape)),
tf.Variable(tf.random_uniform(layer_3_weight_shape))
]
helper.compare_init_weights(
mnist,
'Baseline',
[(basline_weights, 'tf.random_uniform [0, 1)')])
After 858 Batches (2 Epochs):
Validation Accuracy
65.340% -- tf.random_uniform [0, 1)
Loss
64.356 -- tf.random_uniform [0, 1)
我们发现,正确率从之前的10%提高到65%了。说明随机设置权重是有效的。
(2)权重设置范围[-1,1]
uniform_neg1to1_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape, -1, 1)),
tf.Variable(tf.random_uniform(layer_2_weight_shape, -1, 1)),
tf.Variable(tf.random_uniform(layer_3_weight_shape, -1, 1))
]
helper.compare_init_weights(
mnist,
'[0, 1) vs [-1, 1)',
[
(basline_weights, 'tf.random_uniform [0, 1)'),
(uniform_neg1to1_weights, 'tf.random_uniform [-1, 1)')])
After 858 Batches (2 Epochs):
Validation Accuracy
73.840% -- tf.random_uniform [0, 1)
89.360% -- tf.random_uniform [-1, 1)
Loss
13.700 -- tf.random_uniform [0, 1)
5.470 -- tf.random_uniform [-1, 1)
说明权重范围:从负数到正数随机更有效。
3、权重范围
(1)再增加3个范围:-0.1~0,1,-0.01~0.01,-0.001~0.001
uniform_neg01to01_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.1, 0.1)),
tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.1, 0.1)),
tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.1, 0.1))
]
uniform_neg001to001_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.01, 0.01)),
tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.01, 0.01)),
tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.01, 0.01))
]
uniform_neg0001to0001_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.001, 0.001)),
tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.001, 0.001)),
tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.001, 0.001))
]
helper.compare_init_weights(
mnist,
'[-1, 1) vs [-0.1, 0.1) vs [-0.01, 0.01) vs [-0.001, 0.001)',
[
(uniform_neg1to1_weights, '[-1, 1)'),
(uniform_neg01to01_weights, '[-0.1, 0.1)'),
(uniform_neg001to001_weights, '[-0.01, 0.01)'),
(uniform_neg0001to0001_weights, '[-0.001, 0.001)')],
plot_n_batches=None)
After 858 Batches (2 Epochs):
Validation Accuracy
91.000% -- [-1, 1)
97.220% -- [-0.1, 0.1)
95.680% -- [-0.01, 0.01)
94.400% -- [-0.001, 0.001)
Loss
2.425 -- [-1, 1)
0.098 -- [-0.1, 0.1)
0.133 -- [-0.01, 0.01)
0.190 -- [-0.001, 0.001)
发现:范围(-0.1, 0.1)是最佳范围。
(2)将范围(-0.1, 0.1)与约定的一般范围进行比较
一般范围定义为:(-y, y),
import numpy as np
general_rule_weights = [
tf.Variable(tf.random_uniform(layer_1_weight_shape, -1/np.sqrt(layer_1_weight_shape[0]), 1/np.sqrt(layer_1_weight_shape[0]))),
tf.Variable(tf.random_uniform(layer_2_weight_shape, -1/np.sqrt(layer_2_weight_shape[0]), 1/np.sqrt(layer_2_weight_shape[0]))),
tf.Variable(tf.random_uniform(layer_3_weight_shape, -1/np.sqrt(layer_3_weight_shape[0]), 1/np.sqrt(layer_3_weight_shape[0])))
]
helper.compare_init_weights(
mnist,
'[-0.1, 0.1) vs General Rule',
[
(uniform_neg01to01_weights, '[-0.1, 0.1)'),
(general_rule_weights, 'General Rule')],
plot_n_batches=None)
范围(-0.1,0.1)与约定的一般范围一致。
4、正态分布作为权重
(1)tensorflow中提供正态分布的函数
tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
- shape:输出大小
- mean:正态分布的均值
- stddev:正态分布的标准差
- dtype: 输出类型
- seed: 产生分布的随机种子
- name:
helper.hist_dist('Random Normal (mean=0.0, stddev=1.0)', tf.random_normal([1000]))
与之前(-0.1,0.1)进行对比:
normal_01_weights = [
tf.Variable(tf.random_normal(layer_1_weight_shape, stddev=0.1)),
tf.Variable(tf.random_normal(layer_2_weight_shape, stddev=0.1)),
tf.Variable(tf.random_normal(layer_3_weight_shape, stddev=0.1))
]
helper.compare_init_weights(
mnist,
'Uniform [-0.1, 0.1) vs Normal stddev 0.1',
[
(uniform_neg01to01_weights, 'Uniform [-0.1, 0.1)'),
(normal_01_weights, 'Normal stddev 0.1')])
After 858 Batches (2 Epochs):
Validation Accuracy
96.920% -- Uniform [-0.1, 0.1)
97.200% -- Normal stddev 0.1
Loss
0.103 -- Uniform [-0.1, 0.1)
0.099 -- Normal stddev 0.1
正态分布的效果更好。
(2)截断正态分布:truncated normal distribution
截断正态分布定义:将正态分布中,均值>两个标准差、<两个负标准差的数据丢弃。然后从符合的范围内重新抽样。
比如:均值为0,标准差为1,则>2或者<-2的数被丢弃。
tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
- shape: 输出格式
- mean: 截断正态分布的均值
- stddev: 截断正态分布的标准差
- dtype: 输出类型
- seed: 产生分布的随机种子
- name: 操作名称
helper.hist_dist('Truncated Normal (mean=0.0, stddev=1.0)', tf.truncated_normal([1000]))
看上图,-2和2之外的被截断了。
将截断正态分布和上述正态分布对比:
trunc_normal_01_weights = [
tf.Variable(tf.truncated_normal(layer_1_weight_shape, stddev=0.1)),
tf.Variable(tf.truncated_normal(layer_2_weight_shape, stddev=0.1)),
tf.Variable(tf.truncated_normal(layer_3_weight_shape, stddev=0.1))
]
helper.compare_init_weights(
mnist,
'Normal vs Truncated Normal',
[
(normal_01_weights, 'Normal'),
(trunc_normal_01_weights, 'Truncated Normal')])
After 858 Batches (2 Epochs):
Validation Accuracy
97.020% -- Normal
97.480% -- Truncated Normal
Loss
0.088 -- Normal
0.034 -- Truncated Normal
看样子差不多,但其实是因为我们的数据集太小,而且参数也少,看不出效果。
小数据集用正态分布或者截断正态分布都可以,大数据集用截断正态分布。
截断正态分布效果最好,实例如下:
helper.compare_init_weights(
mnist,
'Baseline vs Truncated Normal',
[
(basline_weights, 'Baseline'),
(trunc_normal_01_weights, 'Truncated Normal')])
After 858 Batches (2 Epochs):
Validation Accuracy
66.100% -- Baseline
97.040% -- Truncated Normal
Loss
24.090 -- Baseline
0.075 -- Truncated Normal
截断正态分布的效果更好。
参考论文:
http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf