tensorflow参数初始化--identity initializtion

最新推荐文章于 2023-09-27 09:08:00 发布

imperfect00

最新推荐文章于 2023-09-27 09:08:00 发布

阅读量1.8k

点赞数 1

分类专栏： tensorflow学习笔记

本文链接：https://blog.csdn.net/u011961856/article/details/77893088

版权

tensorflow学习笔记专栏收录该内容

18 篇文章 5 订阅

订阅专栏

卷积层权重初始化的时候,通常有以下几种方法:

1.Random Uniform distribution

函数为:

class RandomUniform(Initializer):
  """Initializer that generates tensors with a uniform distribution.

  Args:
    minval: A python scalar or a scalar tensor. Lower bound of the range
      of random values to generate.
    maxval: A python scalar or a scalar tensor. Upper bound of the range
      of random values to generate.  Defaults to 1 for float types.
    seed: A Python integer. Used to create random seeds. See
      @{tf.set_random_seed}
      for behavior.
    dtype: The data type.
  """

  def __init__(self, minval=0, maxval=None, seed=None, dtype=dtypes.float32):
    self.minval = minval
    self.maxval = maxval
    self.seed = seed
    self.dtype = dtype

  def __call__(self, shape, dtype=None, partition_info=None):
    if dtype is None:
      dtype = self.dtype
    return random_ops.random_uniform(shape, self.minval, self.maxval,
                                     dtype, seed=self.seed)

将参数w初始化值为[minval,maxval]范围内的随机均匀分布

2.Random Normal distribution(正态分布)

函数定义为:

class RandomNormal(Initializer):
  """Initializer that generates tensors with a normal distribution.

  Args:
    mean: a python scalar or a scalar tensor. Mean of the random values
      to generate.
    stddev: a python scalar or a scalar tensor. Standard deviation of the
      random values to generate.
    seed: A Python integer. Used to create random seeds. See
      @{tf.set_random_seed}
      for behavior.
    dtype: The data type. Only floating point types are supported.
  """

  def __init__(self, mean=0.0, stddev=1.0, seed=None, dtype=dtypes.float32):
    self.mean = mean
    self.stddev = stddev
    self.seed = seed
    self.dtype = _assert_float_dtype(dtype)

  def __call__(self, shape, dtype=None, partition_info=None):
    if dtype is None:
      dtype = self.dtype
    return random_ops.random_normal(shape, self.mean, self.stddev,
                                    dtype, seed=self.seed)

将参数w初始化值为均值为mean,方差为stddev的高斯分布值.

3.Truncated Normal distribution(截断正态分布)

函数为:

class TruncatedNormal(Initializer):
  """Initializer that generates a truncated normal distribution.

  These values are similar to values from a `random_normal_initializer`
  except that values more than two standard deviations from the mean
  are discarded and re-drawn. This is the recommended initializer for
  neural network weights and filters.

  Args:
    mean: a python scalar or a scalar tensor. Mean of the random values
      to generate.
    stddev: a python scalar or a scalar tensor. Standard deviation of the
      random values to generate.
    seed: A Python integer. Used to create random seeds. See
      @{tf.set_random_seed}
      for behavior.
    dtype: The data type. Only floating point types are supported.
  """

  def __init__(self, mean=0.0, stddev=1.0, seed=None, dtype=dtypes.float32):
    self.mean = mean
    self.stddev = stddev
    self.seed = seed
    self.dtype = _assert_float_dtype(dtype)

  def __call__(self, shape, dtype=None, partition_info=None):
    if dtype is None:
      dtype = self.dtype
    return random_ops.truncated_normal(shape, self.mean, self.stddev,
                                       dtype, seed=self.seed)

Truncated Normal 与Random Normal一样都为将权重初始化为正态分布,不过对于权重大于阈值(two standard deviations from the mean)的值截断.Truncated Normal初始化为常用的神经网络权重和滤波器初始化方法.

三种初始化方法tensorflow调用示例如下:

w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
                    initializer=tf.random_uniform_initializer(minval=0.0, maxval=1.0 ))

w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
                    initializer=random_normal_initializer(mean=m,stddev=stddev))

w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
                    initializer=tf.truncated_normal_initializer(mean=m,stddev=stddev))

identity initializtion

在CNN中,有时我们希望将权重初始化为上一层的feature map能够完整的传递到下一层,即对于卷积操作 $F2=F1*w$ ,我们希望初始化权重矩阵w,使得 $F2=F1$ ,此时的权重均值w初始化操作就叫identity initializtion.

tensorflow代码实现identity initializtion代码为:

def identity_initializer():
    def _initializer(shape, dtype=tf.float32):
        if len(shape) == 1:
            return tf.constant_op.constant(0., dtype=dtype, shape=shape)
        elif len(shape) == 2 and shape[0] == shape[1]:
            return tf.constant_op.constant(np.identity(shape[0], dtype))
        elif len(shape) == 4 and shape[2] == shape[3]:
            array = np.zeros(shape, dtype=float)
            cx, cy = shape[0]/2, shape[1]/2
            for i in range(shape[2]):
                array[cx, cy, i, i] = 1
            return tf.constant_op.constant(array, dtype=dtype)
        else:
            raise
    return _initializer

def identity_initializer():
    def _initializer(shape, dtype=tf.float32, partition_info=None):
        array = np.zeros(shape, dtype=float)
        cx, cy = shape[0]//2, shape[1]//2
        for i in range(shape[2]):
                array[cx, cy, i, i] = 1
        return tf.constant(array, dtype=dtype)
    return _initializer

初始化后,权重矩阵array的其他值为0,除了array[cx, cy, :,:]为单位矩阵,例如shape=[3,3,8,8],得到的array[1,2,:,:]矩阵值为,

[[ 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1.]]

调用示例代码为:

import tensorflow.contrib.slim as slim
net=slim.conv2d(input,gm,[3,3],rate=1,activation_fn=lrelu,normalizer_fn=nm,weights_initializer=identity_initializer(),scope='g_conv1')