2.3.3.1 gradientAndScore();
这里用于获取梯度和分数
@Override
public Pair<Gradient, Double> gradientAndScore() {
oldScore = score;
model.computeGradientAndScore();
if (iterationListeners != null && iterationListeners.size() > 0) {
for (IterationListener l : iterationListeners) {
if (l instanceof TrainingListener) {
((TrainingListener) l).onGradientCalculation(model);
}
}
}
Pair<Gradient, Double> pair = model.gradientAndScore();
score = pair.getSecond();
updateGradientAccordingToParams(pair.getFirst(), model, model.batchSize());
return pair;
}
2.3.3.2 model.computeGradientAndScore()
@Override
public void computeGradientAndScore() {
//Calculate activations (which are stored in each layer, and used in backprop)
if (layerWiseConfigurations.getBackpropType() == BackpropType.TruncatedBPTT) {
List<INDArray> activations = rnnActivateUsingStoredState(getInput(), true, true);
if (trainingListeners.size() > 0) {
for (TrainingListener tl : trainingListeners) {
tl.onForwardPass(this, activations);
}
}
truncatedBPTTGradient();
} else {
//First: do a feed-forward through the network
//Note that we don't actually need to do the full forward pass through the output layer right now; but we do
// need the input to the output layer to be set (such that backprop can be done)
List<INDArray> activations = feedForwardToLayer(layers.length - 2, true);
if (trainingListeners.size() > 0) {
//TODO: We possibly do want output layer activations in some cases here...
for (TrainingListener tl : trainingListeners) {
tl.onForwardPass(this, activations);
}
}
INDArray actSecondLastLayer = activations.get(activations.size() - 1);
if (layerWiseConfigurations.getInputPreProcess(layers.length - 1) != null)
actSecondLastLayer = layerWiseConfigurations.getInputPreProcess(layers.length - 1)
.preProcess(actSecondLastLayer, getInputMiniBatchSize());
getOutputLayer().setInput(actSecondLastLayer);
//Then: compute gradients
backprop();
}
//Calculate score
if (!(getOutputLayer() instanceof IOutputLayer)) {
throw new IllegalStateException(
"Cannot calculate gradient and score with respect to labels: final layer is not an IOutputLayer");
}
score = ((IOutputLayer) getOutputLayer()).computeScore(calcL1(true), calcL2(true), true);
//Listeners
if (trainingListeners.size() > 0) {
for (TrainingListener tl : trainingListeners) {
tl.onBackwardPass(this);
}
}
}
在dl4j中,除非在网络模型建立的过程中,通过.backpropType(BackpropType.TruncatedBPTT)
方法来改变模型的反向传播方式,那么默认的反向传播方式一定是BackpropType.Standard
(包括RNN、LSTM)。
在确定本次的反向传播方式为BackpropType.Standard
之后,需要执行如下的语句。
//First: 首先对网络做一个前向传播
//Note:现在我们并不需要作完整的前向传播到输出层
//但是我们确实需要算出给输出层的输入(这样Backprop就可以完成)
List<INDArray> activations = feedForwardToLayer(layers.length - 2, true);
接下来进入这个函数体内:
/** Compute the activations from the input to the specified layer, using the currently set input for the network.<br>
* To compute activations for all layers, use feedForward(...) methods<br>
* Note: output list includes the original input. So list.get(0) is always the original input, and
* list.get(i+1) is the activations of the ith layer.
* @param layerNum Index of the last layer to calculate activations for. Layers are zero-indexed.
* feedForwardToLayer(i,input) will return the activations for layers 0..i (inclusive)
* @param train true for training, false for test (i.e., false if using network after training)
* @return list of activations.
*/
public List<INDArray> feedForwardToLayer(int layerNum, boolean train) {
INDArray currInput = input;
List<INDArray> activations = new ArrayList<>();
activations.add(currInput);
for (int i = 0; i <= layerNum; i++) {
currInput = activationFromPrevLayer(i, currInput, train);
//applies drop connect to the activation
activations.add(currInput);
}
return activations;
}
这个函数是计算出所有隐藏层的输出(除去输入层和输出层),并且组成一个INDArray的List,并且包含原始的输入。之后我们单步进入activationFromPrevLayer(i, currInput, train);
函数,查看神经网络的前向传播计算过程。
/**
* Calculate activation from previous layer including pre processing where necessary
*
* @param curr the current layer
* @param input the input
* @return the activation from the previous layer
*/
public INDArray activationFromPrevLayer(int curr, INDArray input, boolean training) {
if (getLayerWiseConfigurations().getInputPreProcess(curr) != null)
input = getLayerWiseConfigurations().getInputPreProcess(curr).preProcess(input, getInputMiniBatchSize());
INDArray ret = layers[curr].activate(input, training);
return ret;
}
使用前一层的输出作为当前层的输入,如果有数据预处理则先进行预处理。然后调用当前层的activate()
方法计算结果。该方法的调用链条如下:
@Override
public INDArray activate(INDArray input, boolean training) {
setInput(input);
return activate(training);
}
@Override
public INDArray activate(boolean training) {
INDArray z = preOutput(training);
//INDArray ret = Nd4j.getExecutioner().execAndReturn(Nd4j.getOpFactory().createTransform(
// conf.getLayer().getActivationFunction(), z, conf.getExtraArgs() ));
INDArray ret = conf().getLayer().getActivationFn().getActivation(z, training);
if (maskArray != null) {
ret.muliColumnVector(maskArray);
}
return ret;
}
preOut这一部分就是网络模型前向传播的重点。