Dl4j-fit(DataSetIterator iterator)源码阅读(三)

2.3.3.1 gradientAndScore();

这里用于获取梯度和分数

@Override
public Pair<Gradient, Double> gradientAndScore() {
    oldScore = score;
    model.computeGradientAndScore();

    if (iterationListeners != null && iterationListeners.size() > 0) {
        for (IterationListener l : iterationListeners) {
            if (l instanceof TrainingListener) {
                ((TrainingListener) l).onGradientCalculation(model);
            }
        }
    }

    Pair<Gradient, Double> pair = model.gradientAndScore();
    score = pair.getSecond();
    updateGradientAccordingToParams(pair.getFirst(), model, model.batchSize());
    return pair;
}
2.3.3.2 model.computeGradientAndScore()
@Override
public void computeGradientAndScore() {
    //Calculate activations (which are stored in each layer, and used in backprop)
    if (layerWiseConfigurations.getBackpropType() == BackpropType.TruncatedBPTT) {
        List<INDArray> activations = rnnActivateUsingStoredState(getInput(), true, true);
        if (trainingListeners.size() > 0) {
            for (TrainingListener tl : trainingListeners) {
                tl.onForwardPass(this, activations);
            }
        }
        truncatedBPTTGradient();
    } else {
        //First: do a feed-forward through the network
        //Note that we don't actually need to do the full forward pass through the output layer right now; but we do
        // need the input to the output layer to be set (such that backprop can be done)
        List<INDArray> activations = feedForwardToLayer(layers.length - 2, true);
        if (trainingListeners.size() > 0) {
            //TODO: We possibly do want output layer activations in some cases here...
            for (TrainingListener tl : trainingListeners) {
                tl.onForwardPass(this, activations);
            }
        }
        INDArray actSecondLastLayer = activations.get(activations.size() - 1);
        if (layerWiseConfigurations.getInputPreProcess(layers.length - 1) != null)
            actSecondLastLayer = layerWiseConfigurations.getInputPreProcess(layers.length - 1)
                            .preProcess(actSecondLastLayer, getInputMiniBatchSize());
        getOutputLayer().setInput(actSecondLastLayer);
        //Then: compute gradients
        backprop();
    }

    //Calculate score
    if (!(getOutputLayer() instanceof IOutputLayer)) {
        throw new IllegalStateException(
                        "Cannot calculate gradient and score with respect to labels: final layer is not an IOutputLayer");
    }
    score = ((IOutputLayer) getOutputLayer()).computeScore(calcL1(true), calcL2(true), true);

    //Listeners
    if (trainingListeners.size() > 0) {
        for (TrainingListener tl : trainingListeners) {
            tl.onBackwardPass(this);
        }
    }
}

在dl4j中,除非在网络模型建立的过程中,通过.backpropType(BackpropType.TruncatedBPTT)方法来改变模型的反向传播方式,那么默认的反向传播方式一定是BackpropType.Standard(包括RNN、LSTM)。
在确定本次的反向传播方式为BackpropType.Standard之后,需要执行如下的语句。

//First: 首先对网络做一个前向传播
//Note:现在我们并不需要作完整的前向传播到输出层
//但是我们确实需要算出给输出层的输入(这样Backprop就可以完成)
List<INDArray> activations = feedForwardToLayer(layers.length - 2, true);

接下来进入这个函数体内:

 /** Compute the activations from the input to the specified layer, using the currently set input for the network.<br>
 * To compute activations for all layers, use feedForward(...) methods<br>
 * Note: output list includes the original input. So list.get(0) is always the original input, and
 * list.get(i+1) is the activations of the ith layer.
 * @param layerNum Index of the last layer to calculate activations for. Layers are zero-indexed.
 *                 feedForwardToLayer(i,input) will return the activations for layers 0..i (inclusive)
 * @param train true for training, false for test (i.e., false if using network after training)
 * @return list of activations.
 */
public List<INDArray> feedForwardToLayer(int layerNum, boolean train) {
    INDArray currInput = input;
    List<INDArray> activations = new ArrayList<>();
    activations.add(currInput);

    for (int i = 0; i <= layerNum; i++) {
        currInput = activationFromPrevLayer(i, currInput, train);
        //applies drop connect to the activation
        activations.add(currInput);
    }
    return activations;
}

这个函数是计算出所有隐藏层的输出(除去输入层和输出层),并且组成一个INDArray的List,并且包含原始的输入。之后我们单步进入activationFromPrevLayer(i, currInput, train);函数,查看神经网络的前向传播计算过程。

/**
 * Calculate activation from previous layer including pre processing where necessary
 *
 * @param curr  the current layer
 * @param input the input 
 * @return the activation from the previous layer
 */
public INDArray activationFromPrevLayer(int curr, INDArray input, boolean training) {
    if (getLayerWiseConfigurations().getInputPreProcess(curr) != null)
        input = getLayerWiseConfigurations().getInputPreProcess(curr).preProcess(input, getInputMiniBatchSize());
    INDArray ret = layers[curr].activate(input, training);
    return ret;
}

使用前一层的输出作为当前层的输入,如果有数据预处理则先进行预处理。然后调用当前层的activate()方法计算结果。该方法的调用链条如下:

@Override
public INDArray activate(INDArray input, boolean training) {
    setInput(input);
    return activate(training);
}

@Override
public INDArray activate(boolean training) {
    INDArray z = preOutput(training);
    //INDArray ret = Nd4j.getExecutioner().execAndReturn(Nd4j.getOpFactory().createTransform(
    //        conf.getLayer().getActivationFunction(), z, conf.getExtraArgs() ));
    INDArray ret = conf().getLayer().getActivationFn().getActivation(z, training);

    if (maskArray != null) {
        ret.muliColumnVector(maskArray);
    }

    return ret;
}

preOut这一部分就是网络模型前向传播的重点。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值