【深度学习图像识别课程】权重初始化

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_41770169/article/details/80328914

一、网络类型

%matplotlib inline

import tensorflow as tf
import helper

from tensorflow.examples.tutorials.mnist import input_data

print('Getting MNIST Dataset...')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print('Data Extracted.')
# Save the shapes of weights for each layer
layer_1_weight_shape = (mnist.train.images.shape[1], 256)
layer_2_weight_shape = (256, 128)
layer_3_weight_shape = (128, mnist.train.labels.shape[1])

 

二、设置权重的办法

1、全0或者全1

由于所有值都一样,反向传播时权重很难更新。梯度也很难更新。

all_zero_weights = [
    tf.Variable(tf.zeros(layer_1_weight_shape)),
    tf.Variable(tf.zeros(layer_2_weight_shape)),
    tf.Variable(tf.zeros(layer_3_weight_shape))
]

all_one_weights = [
    tf.Variable(tf.ones(layer_1_weight_shape)),
    tf.Variable(tf.ones(layer_2_weight_shape)),
    tf.Variable(tf.ones(layer_3_weight_shape))
]

helper.compare_init_weights(
    mnist,
    'All Zeros vs All Ones',
    [
        (all_zero_weights, 'All Zeros'),
        (all_one_weights, 'All Ones')])

如下图可以看出,两个模型效果都很差。

 

After 858 Batches (2 Epochs):
Validation Accuracy
   11.260% -- All Zeros
    9.900% -- All Ones
Loss
    2.300  -- All Zeros
  372.644  -- All Ones

 

2、均匀分布

helper.hist_dist('Random Uniform (minval=-3, maxval=3)', tf.random_uniform([1000], -3, 3))

tf.random_uniform(shape, minval=0, maxval=None, dtype=tf.float32, seed=None, name=None)

  • shape: 输出个数
  • minval: 随机值范围的下界限。默认为0。
  • maxval: 随机值范围的上界限。默认为1。
  • dtype: 输出类型: float32, float64, int32, or int64.
  • seed: 产生随机分布的种子
  • name: 

(1)权重设置范围[0,1]

# Default for tf.random_uniform is minval=0 and maxval=1
basline_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape)),
    tf.Variable(tf.random_uniform(layer_2_weight_shape)),
    tf.Variable(tf.random_uniform(layer_3_weight_shape))
]

helper.compare_init_weights(
    mnist,
    'Baseline',
    [(basline_weights, 'tf.random_uniform [0, 1)')])

 

After 858 Batches (2 Epochs):
Validation Accuracy
   65.340% -- tf.random_uniform [0, 1)
Loss
   64.356  -- tf.random_uniform [0, 1)

我们发现,正确率从之前的10%提高到65%了。说明随机设置权重是有效的。

 

(2)权重设置范围[-1,1]

 

uniform_neg1to1_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape, -1, 1)),
    tf.Variable(tf.random_uniform(layer_2_weight_shape, -1, 1)),
    tf.Variable(tf.random_uniform(layer_3_weight_shape, -1, 1))
]

helper.compare_init_weights(
    mnist,
    '[0, 1) vs [-1, 1)',
    [
        (basline_weights, 'tf.random_uniform [0, 1)'),
        (uniform_neg1to1_weights, 'tf.random_uniform [-1, 1)')])

 

 

 

After 858 Batches (2 Epochs):
Validation Accuracy
   73.840% -- tf.random_uniform [0, 1)
   89.360% -- tf.random_uniform [-1, 1)
Loss
   13.700  -- tf.random_uniform [0, 1)
    5.470  -- tf.random_uniform [-1, 1)

说明权重范围:从负数到正数随机更有效

 

3、权重范围

(1)再增加3个范围:-0.1~0,1,-0.01~0.01,-0.001~0.001

 

uniform_neg01to01_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.1, 0.1)),
    tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.1, 0.1)),
    tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.1, 0.1))
]

uniform_neg001to001_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.01, 0.01)),
    tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.01, 0.01)),
    tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.01, 0.01))
]

uniform_neg0001to0001_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape, -0.001, 0.001)),
    tf.Variable(tf.random_uniform(layer_2_weight_shape, -0.001, 0.001)),
    tf.Variable(tf.random_uniform(layer_3_weight_shape, -0.001, 0.001))
]

helper.compare_init_weights(
    mnist,
    '[-1, 1) vs [-0.1, 0.1) vs [-0.01, 0.01) vs [-0.001, 0.001)',
    [
        (uniform_neg1to1_weights, '[-1, 1)'),
        (uniform_neg01to01_weights, '[-0.1, 0.1)'),
        (uniform_neg001to001_weights, '[-0.01, 0.01)'),
        (uniform_neg0001to0001_weights, '[-0.001, 0.001)')],
    plot_n_batches=None)

 

 

 

After 858 Batches (2 Epochs):
Validation Accuracy
   91.000% -- [-1, 1)
   97.220% -- [-0.1, 0.1)
   95.680% -- [-0.01, 0.01)
   94.400% -- [-0.001, 0.001)
Loss
    2.425  -- [-1, 1)
    0.098  -- [-0.1, 0.1)
    0.133  -- [-0.01, 0.01)
    0.190  -- [-0.001, 0.001)

发现:范围(-0.1, 0.1)是最佳范围

 

(2)将范围(-0.1, 0.1)与约定的一般范围进行比较

一般范围定义为:(-y, y),

 

import numpy as np

general_rule_weights = [
    tf.Variable(tf.random_uniform(layer_1_weight_shape, -1/np.sqrt(layer_1_weight_shape[0]), 1/np.sqrt(layer_1_weight_shape[0]))),
    tf.Variable(tf.random_uniform(layer_2_weight_shape, -1/np.sqrt(layer_2_weight_shape[0]), 1/np.sqrt(layer_2_weight_shape[0]))),
    tf.Variable(tf.random_uniform(layer_3_weight_shape, -1/np.sqrt(layer_3_weight_shape[0]), 1/np.sqrt(layer_3_weight_shape[0])))
]

helper.compare_init_weights(
    mnist,
    '[-0.1, 0.1) vs General Rule',
    [
        (uniform_neg01to01_weights, '[-0.1, 0.1)'),
        (general_rule_weights, 'General Rule')],
    plot_n_batches=None)

范围(-0.1,0.1)与约定的一般范围一致。

 

4、正态分布作为权重

(1)tensorflow中提供正态分布的函数

 

tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)

  • shape:输出大小
  • mean:正态分布的均值
  • stddev:正态分布的标准差
  • dtype: 输出类型
  • seed: 产生分布的随机种子
  • name:
helper.hist_dist('Random Normal (mean=0.0, stddev=1.0)', tf.random_normal([1000]))

 

与之前(-0.1,0.1)进行对比:

 

 

normal_01_weights = [
    tf.Variable(tf.random_normal(layer_1_weight_shape, stddev=0.1)),
    tf.Variable(tf.random_normal(layer_2_weight_shape, stddev=0.1)),
    tf.Variable(tf.random_normal(layer_3_weight_shape, stddev=0.1))
]

helper.compare_init_weights(
    mnist,
    'Uniform [-0.1, 0.1) vs Normal stddev 0.1',
    [
        (uniform_neg01to01_weights, 'Uniform [-0.1, 0.1)'),
        (normal_01_weights, 'Normal stddev 0.1')])

 

After 858 Batches (2 Epochs):
Validation Accuracy
   96.920% -- Uniform [-0.1, 0.1)
   97.200% -- Normal stddev 0.1
Loss
    0.103  -- Uniform [-0.1, 0.1)
    0.099  -- Normal stddev 0.1

正态分布的效果更好。

 

(2)截断正态分布:truncated normal distribution

截断正态分布定义:将正态分布中,均值>两个标准差、<两个负标准差的数据丢弃。然后从符合的范围内重新抽样。

比如:均值为0,标准差为1,则>2或者<-2的数被丢弃。

 

tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)

  • shape: 输出格式
  • mean: 截断正态分布的均值
  • stddev: 截断正态分布的标准差
  • dtype: 输出类型
  • seed: 产生分布的随机种子
  • name: 操作名称
helper.hist_dist('Truncated Normal (mean=0.0, stddev=1.0)', tf.truncated_normal([1000]))

看上图,-2和2之外的被截断了。

将截断正态分布和上述正态分布对比:

 

trunc_normal_01_weights = [
    tf.Variable(tf.truncated_normal(layer_1_weight_shape, stddev=0.1)),
    tf.Variable(tf.truncated_normal(layer_2_weight_shape, stddev=0.1)),
    tf.Variable(tf.truncated_normal(layer_3_weight_shape, stddev=0.1))
]

helper.compare_init_weights(
    mnist,
    'Normal vs Truncated Normal',
    [
        (normal_01_weights, 'Normal'),
        (trunc_normal_01_weights, 'Truncated Normal')])

 

 

 

After 858 Batches (2 Epochs):
Validation Accuracy
   97.020% -- Normal
   97.480% -- Truncated Normal
Loss
    0.088  -- Normal
    0.034  -- Truncated Normal

看样子差不多,但其实是因为我们的数据集太小,而且参数也少,看不出效果。

小数据集用正态分布或者截断正态分布都可以,大数据集用截断正态分布。

 

截断正态分布效果最好,实例如下:

 

helper.compare_init_weights(
    mnist,
    'Baseline vs Truncated Normal',
    [
        (basline_weights, 'Baseline'),
        (trunc_normal_01_weights, 'Truncated Normal')])

 

 

 

After 858 Batches (2 Epochs):
Validation Accuracy
   66.100% -- Baseline
   97.040% -- Truncated Normal
Loss
   24.090  -- Baseline
    0.075  -- Truncated Normal

截断正态分布的效果更好

 

参考论文:

http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

https://arxiv.org/pdf/1502.01852v1.pdf

https://arxiv.org/pdf/1502.03167v2.pdf

展开阅读全文

没有更多推荐了,返回首页