理论
caffe中的softmaxWithLoss其实是:
softmaxWithLoss = Multinomial Logistic Loss Layer + Softmax Layer
其核心公式为:
其中,其中y^为标签值,k为输入图像标签所对应的的神经元。m为输出的最大值,主要是考虑数值稳定性。
反向传播时:
对输入的zj进行求导得:
Caffe中使用
首先在Caffe中使用如下:
1 layer {
2 name: "loss"
3 type: "SoftmaxWithLoss"
4 bottom: "fc8"
5 bottom: "label"
6 top: "loss"
7 }
caffe中softmaxloss 层的参数如下:
// Message that stores parameters shared by loss layers
message LossParameter {
// If specified, ignore instances with the given label.
//忽略那些label
optional int32 ignore_label = 1;
// How to normalize the loss for loss layers that aggregate across batches,
// spatial dimensions, or other dimensions. Currently only implemented in
// SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
enum NormalizationMode {
// Divide by the number of examples in the batch times spatial dimensions.
// Outputs that receive the ignore label will NOT be ignored in computing
// the normalization factor.
//一次前向计算的loss除以所有的label数
FULL = 0;
// Divide by the total number of output locations that do not take the
// ignore_label. If ignore_label is not set, this behaves like FULL.
//一次前向计算的loss除以所有的可用的label数
VALID = 1;
// Divide by the batch size.
//除以batchsize大小,默认为batchsize大小。
BATCH_SIZE = 2;
// Do not normalize the loss.
NONE = 3;
}
// For historical reasons, the default normalization for
// SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
optional NormalizationMode normalization = 3 [default = VALID];
// Deprecated. Ignored if normalization is specified. If normalization
// is not specified, then setting this to false will be equivalent to
// normalization = BATCH_SIZE to be consistent with previous behavior.
//如果normalize==false,则normalization=BATCH_SIZE
//如果normalize==true,则normalization=Valid
optional bool normalize = 2;
}
首先来看一下softmaxwithloss的头文件:
#ifndef CAFFE_SOFTMAX_WITH_LOSS_LAYER_HPP_
#define CAFFE_SOFTMAX_WITH_LOSS_LAYER_HPP_
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/layers/loss_layer.hpp"
#include "caffe/layers/softmax_layer.hpp"
namespace caffe {
/**
* @brief Computes the multinomial logistic loss for a one-of-many
* classification task, passing real-valued predictions through a
* softmax to get a probability distribution over classes.
*
* This layer should be preferred over separate
* SoftmaxLayer + MultinomialLogisticLossLayer
* as its gradient computation is more numerically stable.
* At test time, this layer can be replaced simply by a SoftmaxLayer.
*
* @param bottom input Blob vector (length 2)
* -# @f$ (N \times C \times H \times W) @f$
* the predictions @f$ x @f$, a Blob with values in
* @f$ [-\infty, +\infty] @f$ indicating the predicted score for each of
* the @f$ K = CHW @f$ classes. This layer maps these scores to a
* probability distribution over classes using the softmax function
* @f$ \hat{p}_{nk} = \exp(x_{nk}) /
* \left[\sum_{k'} \exp(x_{nk'})\right] @f$ (see SoftmaxLayer).
* -# @f$ (N \times 1 \times 1 \times 1) @f$
* the labels @f$ l @f$, an integer-valued Blob with values
* @f$ l_n \in [0, 1, 2, ..., K - 1] @f$
*