deeplearning4j梯度标准化策略
deeplearning4j一共提供了如下策略:
package org.deeplearning4j.nn.conf;
public enum GradientNormalization {
None, RenormalizeL2PerLayer, RenormalizeL2PerParamType, ClipElementWiseAbsoluteValue, ClipL2PerLayer, ClipL2PerParamType
}
方法的调用在package org.deeplearning4j.nn.updater
下的BaseMultiLayerUpdater
中的
/**
* Pre-apply: Apply gradient normalization/clipping
*
* @param layer Layer to apply gradient normalization/clipping for
* @param gradient Gradient to update
* @param iteration The current iteration (i.e., number of parameter updates so far)
*/
public void preApply(Layer layer, Gradient gradient, int iteration)
方法。
RenormalizeL2PerLayer
rescale gradients by dividing by the L2 norm of all gradients for the layer.
- 计算传入梯度的二范数
- 梯度除以对应的二范数
if (layerGradientView != null) {
double l2 = layerGradientView.norm2Number().doubleValue();
layerGradientView.divi(l2);
}
RenormalizeL2PerParamType
rescale gradients by dividing by the L2 norm of the gradients, separately for each type of parameter within the layer.
This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.
For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows:
GOut_weight = G_weight / l2(G_weight)
GOut_bias = G_bias / l2(G_bias)
对于神经网络每一层的不同参数的梯度,分别除以对应梯度的二范数。以简单的前向传播网络来说,每一层只有两个参数权重(Weight)和偏置(Bias),假设对应的梯度分别为G_weight和GOut_bias,那么对应的标准化方法就为:
GOut_weight = G_weight / l2(G_weight)
GOut_bias = G_bias / l2(G_bias)
for (INDArray g : gradient.gradientForVariable().values()) {
double l2 = Nd4j.getExecutioner().execAndReturn(new Norm2(g)).getFinalResult().doubleValue();
g.divi(l2);
}
ClipElementWiseAbsoluteValue
clip the gradients on a per-element basis.
For each gradient g, set g <- sign(g)max(maxAllowedValue,|g|).
i.e., if a parameter gradient has absolute value greater than the threshold, truncate it.
这个方法是根据设定的阈值对梯度进行裁剪。
如果参数梯度中元素的绝对值大于设定的阈值,则进行裁剪。
例如:
我们设定阈值 threshold=5