deeplearning4j梯度标准化策略

最新推荐文章于 2024-08-06 09:11:39 发布

寒沧

最新推荐文章于 2024-08-06 09:11:39 发布

阅读量1.2k

点赞数

分类专栏： deeplearning4j DeepLearning4j

本文链接：https://blog.csdn.net/u011669700/article/details/78974518

版权

本文介绍了deeplearning4j中的几种梯度标准化策略，包括RenormalizeL2PerLayer、RenormalizeL2PerParamType、ClipElementWiseAbsoluteValue、ClipL2PerLayer和ClipL2PerParamType。这些策略用于调整神经网络训练过程中的梯度，以优化学习效果。例如，RenormalizeL2PerLayer通过计算梯度的二范数进行调整，而ClipElementWiseAbsoluteValue则会根据阈值裁剪梯度的绝对值。

摘要由CSDN通过智能技术生成

deeplearning4j梯度标准化策略

deeplearning4j一共提供了如下策略：

package org.deeplearning4j.nn.conf;

public enum GradientNormalization {
    None, RenormalizeL2PerLayer, RenormalizeL2PerParamType, ClipElementWiseAbsoluteValue, ClipL2PerLayer, ClipL2PerParamType
}

方法的调用在package org.deeplearning4j.nn.updater下的BaseMultiLayerUpdater中的

/**
 * Pre-apply: Apply gradient normalization/clipping
 *
 * @param layer     Layer to apply gradient normalization/clipping for
 * @param gradient  Gradient to update
 * @param iteration The current iteration (i.e., number of parameter updates so far)
 */
public void preApply(Layer layer, Gradient gradient, int iteration)

方法。

RenormalizeL2PerLayer

rescale gradients by dividing by the L2 norm of all gradients for the layer.

计算传入梯度的二范数
梯度除以对应的二范数

if (layerGradientView != null) {
    double l2 = layerGradientView.norm2Number().doubleValue();
    layerGradientView.divi(l2);
}

RenormalizeL2PerParamType

rescale gradients by dividing by the L2 norm of the gradients, separately for each type of parameter within the layer.

This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.
For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows:
　　　　　　GOut_weight = G_weight / l2(G_weight)
　　　　　　GOut_bias = G_bias / l2(G_bias)

对于神经网络每一层的不同参数的梯度，分别除以对应梯度的二范数。以简单的前向传播网络来说，每一层只有两个参数权重（Weight）和偏置（Bias），假设对应的梯度分别为G_weight和GOut_bias，那么对应的标准化方法就为：
　　　　　　GOut_weight = G_weight / l2(G_weight)
　　　　　　GOut_bias = G_bias / l2(G_bias)
　　　　　　

for (INDArray g : gradient.gradientForVariable().values()) {
    double l2 = Nd4j.getExecutioner().execAndReturn(new Norm2(g)).getFinalResult().doubleValue();
    g.divi(l2);
}

ClipElementWiseAbsoluteValue

clip the gradients on a per-element basis.
For each gradient g, set g <- sign(g)max(maxAllowedValue,|g|).
i.e., if a parameter gradient has absolute value greater than the threshold, truncate it.

这个方法是根据设定的阈值对梯度进行裁剪。
如果参数梯度中元素的绝对值大于设定的阈值，则进行裁剪。
例如：
我们设定阈值 threshold=5