tf.contrib.layers.batch_norm

参考   tf.contrib.layers.batch_norm - 云+社区 - 腾讯云

Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167

tf.contrib.layers.batch_norm(
    inputs,
    decay=0.999,
    center=True,
    scale=False,
    epsilon=0.001,
    activation_fn=None,
    param_initializers=None,
    param_regularizers=None,
    updates_collections=tf.GraphKeys.UPDATE_OPS,
    is_training=True,
    reuse=None,
    variables_collections=None,
    outputs_collections=None,
    trainable=True,
    batch_weights=None,
    fused=None,
    data_format=DATA_FORMAT_NHWC,
    zero_debias_moving_mean=False,
    scope=None,
    renorm=False,
    renorm_clipping=None,
    renorm_decay=0.99,
    adjustment=None
)

"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

Sergey Ioffe, Christian Szegedy

Can be used as a normalizer function for conv2d and fully_connected. The normalization is over all but the last dimension if data_format is NHWC and all but the second dimension if data_format is NCHW. In case of a 2D tensor this corresponds to the batch dimension, while in case of a 4D tensor this corresponds to the batch and space dimensions.

Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:

  update_ops = tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

One can set updates_collections=None to force the updates in place, but that can have a speed penalty, especially in distributed settings.

Args:

  • inputs: A tensor with 2 or more dimensions, where the first dimension has batch_size. The normalization is over all but the last dimension if data_format is NHWC and the second dimension if data_format is NCHW.
  • decay: Decay for the moving average. Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower decay value (recommend trying decay=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.
  • center: If True, add offset of beta to normalized tensor. If False, beta is ignored.
  • scale: If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling can be done by the next layer.
  • epsilon: Small float added to variance to avoid dividing by zero.
  • activation_fn: Activation function, default set to None to skip it and maintain a linear activation.
  • param_initializers: Optional initializers for beta, gamma, moving mean and moving variance.
  • param_regularizers: Optional regularizer for beta and gamma.
  • updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.
  • is_training: Whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into moving_mean and moving_variance using an exponential moving average with the given decay. When it is not in training mode then it would use the values of the moving_mean and the moving_variance.
  • reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
  • variables_collections: Optional collections for the variables.
  • outputs_collections: Collections to add the outputs.
  • trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
  • batch_weights: An optional tensor of shape [batch_size], containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)
  • fused: if None or True, use a faster, fused implementation if possible. If False, use the system recommended implementation.
  • data_format: A string. NHWC (default) and NCHW are supported.
  • zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.
  • scope: Optional scope for variable_scope.
  • renorm: Whether to use Batch Renormalization (https://arxiv.org/abs/1702.03275). This adds extra variables during training. The inference is the same for either value of this parameter.
  • renorm_clipping: A dictionary that may map keys 'rmax', 'rmin', 'dmax' to scalar Tensors used to clip the renorm correction. The correction (r, d) is used as corrected_value = normalized_value * r + d, with r clipped to [rmin, rmax], and d to [-dmax, dmax]. Missing rmax, rmin, dmax are set to inf, 0, inf, respectively.
  • renorm_decay: Momentum used to update the moving means and standard deviations with renorm. Unlike momentum, this affects training and should be neither too small (which would add noise) nor too large (which would give stale estimates). Note that decay is still applied to get the means and variances for inference.
  • adjustment: A function taking the Tensor containing the (dynamic) shape of the input tensor and returning a pair (scale, bias) to apply to the normalized values (before gamma and beta), only during training. For example, adjustment = lambda shape: ( tf.random.uniform(shape[-1:], 0.93, 1.07), tf.random.uniform(shape[-1:], -0.1, 0.1)) will scale the normalized value by up to 7% up or down, then shift the result by up to 0.1 (with independent scaling and bias for each feature but shared across all examples), and finally apply gamma and/or beta. If None, no adjustment is applied.

Returns:

  • A Tensor representing the output of the operation.

Raises:

  • ValueError: If data_format is neither NHWC nor NCHW.
  • ValueError: If the rank of inputs is undefined.
  • ValueError: If rank or channels dimension of inputs is undefined.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
`tf.contrib.layers.variance_scaling_initializer()` 是 TensorFlow 中的一个初始化器函数,用于初始化神经网络中的权重。该函数采用了一种比较先进的初始化方法,即"Variance Scaling Initialization",可以有效地缓解梯度消失和梯度爆炸的问题,提高神经网络的训练效果。 该函数的语法如下: ``` tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False, seed=None, dtype=tf.float32) ``` 参数说明: - `factor`:用于缩放输出的标准偏差的因子。默认值为 2.0。 - `mode`:确定使用的缩放方式。可以是 "FAN_IN"(输入节点数量),"FAN_OUT"(输出节点数量)或 "FAN_AVG"(输入和输出节点数量的平均值)。默认值为 "FAN_IN"。 - `uniform`:如果为 True,则从均匀分布中采样,否则从正态分布中采样。默认值为 False。 - `seed`:随机数生成器的种子。默认为 None。 - `dtype`:所需的初始化数据类型。默认为 tf.float32。 使用示例: ```python import tensorflow as tf # 定义一个全连接层,使用 variance scaling 初始化权重 fc1 = tf.layers.dense(inputs=x, units=256, activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.variance_scaling_initializer()) # 定义一个卷积层,使用 variance scaling 初始化卷积核 conv1 = tf.layers.conv2d(inputs=x, filters=32, kernel_size=[3, 3], padding="same", activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.variance_scaling_initializer()) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Wanderer001

ROIAlign原理

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值