caffe层解读系列——hinge_loss

————— Hinge Loss 定义 —————

Hinge Loss 主要针对要求”maximum-margin”的分类问题，因此尤其适用于SVM分类。

Hinge Loss的定义如下：

$$l(y) = max(0,1-t\cdot y)$$

———— caffe中如何定义Hinge Loss ————

caffe中定义与上面的介绍有些相反的地方，下面具体介绍caffe中具体是怎样实现的。

caffe提供了 L1 和 L2 两种Hinge Loss,即

$$l(y) = \Vert H\Vert_1$$$$l(y) = \Vert H\Vert_2$$

$$H_i = max(0,1+t\cdot y), \quad if\ \ i=label,\ 则t=-1; \quad 否则 \ t=1$$

ID 1 2 3 4 5
y -1.73 -1.24 0.89 -0.99 0.05
t 1 1 -1 1 1

ID 1 2 3 4 5
H 0.00 0.00 0.11 0.01 1.05

$$l(y) = \Vert H\Vert_1 = \sum_{i=1}^{5}H_i = 1.17$$

$$l(y) = \Vert H\Vert_2 = \sum_{i=1}^{5}H_{i}^{2} = 1.1147$$

caffe中的实现源码如下：

void HingeLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const Dtype* bottom_data = bottom[0]->cpu_data();
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype* label = bottom[1]->cpu_data();
int num = bottom[0]->num();
int count = bottom[0]->count();
int dim = count / num;

caffe_copy(count, bottom_data, bottom_diff);
for (int i = 0; i < num; ++i) {
bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
}
for (int i = 0; i < num; ++i) {
for (int j = 0; j < dim; ++j) {
bottom_diff[i * dim + j] = std::max(
Dtype(0), 1 + bottom_diff[i * dim + j]);
}
}
Dtype* loss = top[0]->mutable_cpu_data();
switch (this->layer_param_.hinge_loss_param().norm()) {
case HingeLossParameter_Norm_L1:
loss[0] = caffe_cpu_asum(count, bottom_diff) / num;
break;
case HingeLossParameter_Norm_L2:
loss[0] = caffe_cpu_dot(count, bottom_diff, bottom_diff) / num;
break;
default:
LOG(FATAL) << "Unknown Norm";
}
}

———— caffe中Hinge Loss如何求导————

Hinge Loss的求导非常简单，

$$\frac{\partial H_i}{\partial y} = 0, \quad if\ \ H_i=0$$

$$\frac{\partial H_i}{\partial y} = \frac{\partial (1+t\cdot y)}{\partial y} = t, \quad if\ \ H_i\neq 0$$

ID 1 2 3 4 5
$$H$$ 0.00 0.00 0.11 0.01 1.05
$$L1：\partial H_i$$
0.00 0.00 -1.00 1.00 1.00
$$L2：\partial H_i$$
0.00 0.00 -0.22 0.02 2.10

caffe中的求导实现源码如下：

  if (propagate_down[0]) {
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype* label = bottom[1]->cpu_data();
int num = bottom[0]->num();
int count = bottom[0]->count();
int dim = count / num;

for (int i = 0; i < num; ++i) {
bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
}

const Dtype loss_weight = top[0]->cpu_diff()[0];
switch (this->layer_param_.hinge_loss_param().norm()) {
case HingeLossParameter_Norm_L1:
caffe_cpu_sign(count, bottom_diff, bottom_diff);
caffe_scal(count, loss_weight / num, bottom_diff);
break;
case HingeLossParameter_Norm_L2:
caffe_scal(count, loss_weight * 2 / num, bottom_diff);
break;
default:
LOG(FATAL) << "Unknown Norm";
}
}