Caffe Loss 层 - LossLayers

最新推荐文章于 2019-09-23 18:51:41 发布

AIHGF

最新推荐文章于 2019-09-23 18:51:41 发布

阅读量3.2k

点赞数 1

分类专栏： Caffe CaffeLayer Caffe 文章标签： Caffe Loss

本文链接：https://blog.csdn.net/zziahgf/article/details/79155969

版权

Caffe 同时被 3 个专栏收录

37 篇文章 2 订阅

订阅专栏

Caffe

30 篇文章 5 订阅

订阅专栏

CaffeLayer

17 篇文章 0 订阅

订阅专栏

Caffe Loss 层

Loss 计算的是网络输出的 target 值与真实label之间的误差，最小化以优化网络.

Loss 值由 forward-pass 计算得到，并在 backward-pass 计算关于 loss 的梯度值.

Caffe 主要提供了以下 Loss 层：

1. SoftmaxWithLoss

用于一对多(one-of-many) 的分类任务，计算多项 logistic 损失值. 通过 softmax 来传递实值预测值，以得到关于各类的概率分布.

该网络层可以分解为 SoftmaxLayer + MultinomialLogisticLoss 层的组合，不过其梯度计算更加数值稳定.

测试时，该网络层可以由 SoftmaxLayer 层代替.

1.1 Forward 参数

输入参数：

Input 1 - 预测值 $x$ ， $(N×C×H×W)$ ，其值区间为 $[-inf, inf]$ ，表示对于 $K=CHW$ 类的每一类的预测分数值.

通过 SoftmaxLayer $\hat{p}_{nk} = \frac{exp(x_{nk})}{[\sum_{k'}exp(x_{nk'})]}$ 来将预测值(scores) $x$ 映射得到关于各类别的概率分布.
Input2 - 真实值 label $l$ ， $(N×1 × 1 × 1)$ ，实值，区间为 $l_n \in [0, 1, 2, ..., K-1]$ ，分别表示 $K$ 类中的真实类别标签 label.

输出参数：

Output 1 - 计算的 cross-entropy 分类 loss 值， $(1 × 1 × 1 × 1)$ ， $E = \frac{-1}{N} \sum _{n=1}^{N} log(\hat{p}_n, l_n)$ ， $\hat{p}$ 是 Softmax 输出的类别概率. [注：SoftmaxLayer 只是输出每一类的概率值，并不与 label 作比较.]

1.2 Backward 参数

计算关于预测值的 softmax loss 误差值梯度.

不计算关于 label 输入[bottom[1]]的梯度.

template<typename Dtype >
void caffe::SoftmaxWithLossLayer< Dtype >::Backward_cpu (   
    const vector< Blob< Dtype > *> &    top,
    const vector< bool > &  propagate_down,
    const vector< Blob< Dtype > *> &    bottom 
)

参数：

top - $(1×1 × 1 × 1)$ ，其 diff 为 loss_weight $\lambda$ ，因为 $\lambda$ 是该层输出 $l_i$ 的系数，整体网络 Loss $E = \lambda_i l_i + other \ loss \ terms$ ，有 $\frac{\partial E}{\partial l_i} = \lambda _i$ .
propagate_down[1] - 必须是 false，因为不对 label 作梯度计算.
bottom - [0] $(N × C × H × W)$ ，预测值 $x$ ；backward 计算 diff $\frac{\partial E}{ \partial x}$ .

[1] $(N × 1 × 1 × 1)$ ，labels，忽略，不计算.

1.3 prototxt 定义

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
  loss_param{
    ignore_label：0  # 指定 label 值，在计算 loss 时忽略该值.
    normalize: true # 如果为 true，则基于当前 labels 数量(不包含忽略的 label) 进行归一化; 否则，只是加和.
  }
}

2. EuclideanLoss

计算两个输入的平方和.

用于实值回归任务.

Euclidean Loss 计算： $E = \frac{1}{2N} \sum _{n=1}^{N} ||\hat{y}_n - y_n||_2^2$ .

可以用于最小二乘(least-squares) 回归任务. 将 InnerProductLayer 的输出值作为 EuclideanLossLayer 的输入，即是线性最小二乘回归问题.

2.1 Forward 参数

输入参数：

Input 1 - $(N × C × H ×W)$ ，预测值 $\hat{y} \in [-inf, inf]$
Input 2 - $(N × C × H ×W))$ ，目标值 $y \in [-inf, inf]$

输出参数：

Output 1 - $(1 × 1 × 1 × 1)$ ，计算的 Euclidean Loss 值.

2.2 Backward 参数

计算关于输入的 Euclidean 误差梯度.

template<typename Dtype >
void caffe::EuclideanLossLayer< Dtype >::Backward_cpu   (   
    const vector< Blob< Dtype > *> &    top,
    const vector< bool > &  propagate_down,
    const vector< Blob< Dtype > *> &    bottom 
)

参数：

top - 如上.
propagate_down - EuclideanLossLayer 可以计算关于 label (bottom[1]) 的梯度.
bottom - [0] $(N × C × H × W)$ ，预测值 $\hat{y}$ ；backward 计算梯度 diff $\frac{\partial E}{ \partial {\hat{y}}} = \frac{1}{n}\sum_{n=1}^{N}(\hat{y}_n - y_n)$ .

[1] $(N × C × H × W)$ ，真实值 $y$ ；backward 计算梯度 diff $\frac{\partial E}{ \partial y} = \frac{1}{n}\sum_{n=1}^{N}(y_n - \hat{y}_n)$ .

2.3 prototxt 定义

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}

3. MultinomialLogisticLoss

多项 logistic 损失函数层，用于一对多的分类任务，其直接采用预测的概率分布作为网络层输入.

当预测值不是概率分布时，应该采用 SoftmaxWithLossLayer，其在计算多项 logistic loss 前，采用 SoftmaxLayer 将预测值映射到概率分布.

3.1 Forward 参数

输入参数：

Input 1 - 预测值 $\hat{p}$ ， $(N × C ×H× W)$ ，其取值区间 $[0, 1]$ ，表示对于 $K=CHW$ 类的预测概率.

每个预测向量 $\hat{p}_n$ 的和应该为 1， $\forall n \sum_{k=1}^{K} \hat{p}_{nk} = 1$ .
Input2 - 真实值 label $l$ ， $(N × 1 × 1 × 1)$ ，实值 $l_n \in [0, 1, 2,...,K-1]$ ，为 $K$ 类 classes 中的真实类别标签.

输出参数：

Output 1 - $(1 × 1 × 1 × 1)$ ，计算的多项 logistic loss 值： $E = \frac{-1}{N} \sum _{n=1}^{N} log(\hat{p}_n, l_n)$ .

3.2 prototxt 定义

layer {
  name: "loss"
  type: "MultinomialLogisticLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
  loss_param{
    ignore_label：0
    normalize: true
    FULL = 0
  }
}

message LossParameter {
  optional int32 ignore_label = 1;
  enum NormalizationMode {
    FULL = 0;
    VALID = 1;
    BATCH_SIZE = 2;
    NONE = 3;
  }
  optional NormalizationMode normalization = 3 [default = VALID];
  optional bool normalize = 2;
}

4. InfogainLoss

信息增益损失函数

InfogainLossLayer 是 MultinomalLogisticLossLayer 的一种泛化形式.

其采用“信息增益”(information gain， infogain) 矩阵来指定所有的 label pairs 的“值”(value).

不仅仅接受预测的每个样本在每类上的概率信息，还接受信息增益矩阵信息.

当 infogain 矩阵是单位矩阵时，则与 MultinomalLogisticLossLayer 等价.

message InfogainLossParameter {
  // Specify the infogain matrix source.
  optional string source = 1;
  optional int32 axis = 2 [default = 1]; // axis of prob
}

4.1 Forward 参数

输入参数：

Input 1 - 预测值 $\hat{p}$ ， $(N × C ×H× W)$ ，其取值区间 $[0, 1]$ ，表示对于 $K=CHW$ 类的预测概率.

每个预测向量 $\hat{p}_n$ 的和应该为 1， $\forall n \sum_{k=1}^{K} \hat{p}_{nk} = 1$ .
Input2 - 真实值 label $l$ ， $(N × 1 × 1 × 1)$ ，实值 $l_n \in [0, 1, 2,...,K-1]$ ，为 $K$ 类 classes 中的真实类别标签.
Input3 - (optional)， $1 × 1 × K × K$ ，infogain 矩阵 $H$ .

输出参数：

Output 1 - $(1 × 1 × 1 × 1)$ ，计算的 infogain 多项 logistic loss 值： $E = \frac{-1}{N} \sum _{n=1}^{N} H_{l_n}log(\hat{p}_n, l_n) = \frac{-1}{N} \sum_{n=1}^{N} \sum _{k=1}^{K} H_{l_n} log(\hat{p}_n, k)$ .

其中 $H_{l_n}$ 表示 infogain 矩阵 $H$ 的第 $l_n$ 行.

4.2 prototxt 定义

layer {
    bottom: "score"
    bottom: "label"
    top: "infoGainLoss"
    name: "infoGainLoss"
    type: "InfogainLoss"
    infogain_loss_param {
        source: "/.../infogainH.binaryproto"
        axis: 1  # compute loss and probability along axis
    }
}

5. HingeLoss

用于一对多 的分类任务.

其有时也被叫做 Max-Margin Loss. SVM 的目标函数也层用过.

比如，二分类情况时，

$l(y) = max(0, 1-t \cdot y)$

$y$ 为[-1, 1]区间的预测值， $t=[+1, -1]$ 为目标值.

也就是 $|y| \leq 1$ ，也就是对某个正确分类的样本距离分割线的距离大于1时，不给予任何奖赏，避免分类过度注重某些类，更关注与整体的分类 Loss.

message HingeLossParameter {
  enum Norm {
    L1 = 1;
    L2 = 2;
  }
  // Specify the Norm to use L1 or L2
  optional Norm norm = 1 [default = L1];
}

5.1 Forward 参数

输入参数：

Input 1 - 预测值 $t$ ， $(N × C ×H× W)$ ，其取值区间 $[-inf，inf]$ ，表示对于 $K=CHW$ 类的预测概率.

在 SVM 中，假设 D-dim 特征 $X \in \mathcal{R}^{D × N}$ 和学习的超参数 $W \in \mathcal{R}^{D × K}$ ， $t$ 是內积 $X^{T}W$ 的结果.

因此，如果网络只有一个 InnerProductLayer，其num_output=D，将其输出的预测值输入到 HingeLossLayer，且没有其它待学习参数或 losses，则等价于 SVM.
Input2 - 真实值 label $l$ ， $(N × 1 × 1 × 1)$ ，实值 $l_n \in [0, 1, 2,...,K-1]$ ，为 $K$ 类 classes 中的真实类别标签.

输出参数：

Output 1 - $(1 × 1 × 1 × 1)$ ，计算的 hinge loss 值： $E = \frac{1}{N} \sum _{n=1}^{N} \sum_{k=1}^{K} [max(0, 1 - \delta \{ l_n = k\} t_{nk})]^p$ ，

$L^p$ 范数，默认 $p=1$ ，L1 范数； $p=2$ ，L2 范数，如 L2-SVM.

$\delta \{condition\} = 1, if \ condition; otherwise, \delta \{condition\} = -1$

5.2 prototxt 定义

# L1 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"
}

# L2 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  hinge_loss_param {
    norm: L2
  }
}

6. ContrastiveLoss

Caffe Siamese Network 采用了 ContrastiveLoss 函数，能够有效的处理 paired data.

如 Caffe - mnist_siamese.ipynb.

ContrastiveLoss 计算公式：

$E = \frac{1}{2N} \sum _{n=1}^{N} (y) d^2 + (1-y) max(margin - d, 0)^2$

其中， $d = ||a_n - b_n||_2$ .

message ContrastiveLossParameter {
  // margin for dissimilar pair
  optional float margin = 1 [default = 1.0];
  // The first implementation of this cost did not exactly match the cost of
  // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
  // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
  // Hadsell paper. New models should probably use this version.
  // legacy_version = true uses (margin - d^2). This is kept to support /
  // reproduce existing models and results
  optional bool legacy_version = 2 [default = false];
}

6.1 Forward 参数

输入参数：

Input 1 - $(N × C × 1 × 1)$ ，特征 $a \in [-inf, inf]$
Input2 - $(N × C × 1 × 1)$ ，特征 $b \in [-inf, inf]$
Input3 - $N × 1 × 1 × 1$ ，二值相似度 $s \in [0, 1]$

输出参数：

Output 1 - $(1 × 1 × 1 × 1)$ ，计算的 contrastive loss 值 $E$ ，用于训练 siamese 网络.

6.2 prototxt 定义

layer {
  name: "loss"
  type: "ContrastiveLoss"
  bottom: "feat"
  bottom: "feat_p"
  bottom: "sim"
  top: "loss"
  contrastive_loss_param {
    margin: 1
  }
}

From mnist_siamese_train_test.prototxt

7. Accuracy

计算 一对多 分类任务的分类精度.

没有 backward 计算.

message AccuracyParameter {
  // top_k 精度
  optional uint32 top_k = 1 [default = 1];

  // The "label" axis of the prediction blob, whose argmax corresponds to the
  // predicted label -- may be negative to index from the end (e.g., -1 for the
  // last axis).  For example, if axis == 1 and the predictions are
  // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
  // labels with integer values in {0, 1, ..., C-1}.
  optional int32 axis = 2 [default = 1];

  // 精度计算，忽略 ignore_label 
  optional int32 ignore_label = 3;
}