CosineFace实现

最新推荐文章于 2024-09-24 07:39:15 发布

huneng1991

最新推荐文章于 2024-09-24 07:39:15 发布

阅读量780

点赞数 1

分类专栏：人脸sdk 文章标签：人脸识别机器学习深度学习

本文链接：https://blog.csdn.net/huneng1991/article/details/105938881

版权

人脸sdk 专栏收录该内容

2 篇文章 1 订阅

订阅专栏

介绍

CosFace: Large Margin Cosine Loss for Deep Face Recognition 是腾讯出的一篇很有名的人脸识别算法，在loss上的创新，用分类的方法实现高精度人脸识别。在github上搜索会有很多实现，有一些做的也很不错。对我来说，我觉得都不太合适，一是我要caffe版本，二是大家实现的方式都不太一样，用起来感觉不太放心，三是，我融入了一些和其他实现不一样的想法。

Normalize layer

类似cosine face的方法，都以normalize feature 作为基础条件进行设计，normalize前向公式是很简单的，主要是反向求导比较复杂，数学公式可以到这里看：https://blog.csdn.net/Iriving_shu/article/details/78300192，其中可以考虑把y置换成x，就可以对应这个实现：https://github.com/weiliu89/caffe/tree/ssd，其他的实现，如caffe-windows都写的很复杂，要搞清楚对不对还是要花些时间的。当然用这个实现还是要去修改的，主要是把scale的学习给关掉，让它为常数1。

LMCL

数学公式请看论文，这里只讲解代码，地址见：https://github.com/huneng/cosine_loss


template <typename Dtype>
void LMCLLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
        const vector<Blob<Dtype>*>& top) {
    Dtype scale;
    // norm weight
    if(this->phase_ == TRAIN){
        Dtype *weight = this->blobs_[0]->mutable_cpu_data();

        scale = Dtype(0.0f);
        for(int i = 0; i < N_; i ++){
            Dtype dot;
            dot = caffe_cpu_dot(K_, weight, weight);
            dot = sqrt(dot + 1e-10);
            caffe_scal(K_, Dtype(1.0) / dot, weight);
            weight += K_;

            scale += dot;
        }

        scale /= N_;

        caffe_scal(N_ * K_, scale, this->blobs_[0]->mutable_cpu_data());
    }

    // y = W * x
    caffe_cpu_gemm(CblasNoTrans, CblasTrans, M_, N_, K_,
            Dtype(1.0), bottom[0]->cpu_data(), this->blobs_[0]->cpu_data(),
            Dtype(0.0), top[0]->mutable_cpu_data());

    //*
    // margin
    if(this->phase_ == TRAIN && margin_ > Dtype(1e-5f)){
        Dtype *y = top[0]->mutable_cpu_data();
        const Dtype *ptrLabel = bottom[1]->cpu_data();

        for(int i = 0; i < M_; i ++){
            int label = int(ptrLabel[i]);
            CHECK(label < N_);
            y[label] = ((y[label] / scale) - margin_) * scale;
            y += N_;
        }
    }
    // */
}



template <typename Dtype>
void LMCLLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
        const vector<bool>& propagate_down,
        const vector<Blob<Dtype>*>& bottom) {
    if (this->param_propagate_down_[0]) {
        const Dtype* dy = top[0]->cpu_diff();
        const Dtype* x = bottom[0]->cpu_data();
        Dtype *dw = this->blobs_[0]->mutable_cpu_diff();

        caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
                N_, K_, M_,
                (Dtype)1., dy, x, (Dtype)0., dw);
    }

    if (propagate_down[0]) {
        const Dtype* dy = top[0]->cpu_diff();
        const Dtype* w = this->blobs_[0]->cpu_data();
        Dtype *dx = bottom[0]->mutable_cpu_diff();

        caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans,
                M_, K_, N_,
                (Dtype)1., dy, w, (Dtype)0., dx);
    }
}

实现需注意几个细节

1 理论上权重是要单位化，但是实际使用的时候并不需要这一层，所以仅设计训练的时候单位化权重；

2很多实现都说收敛很慢，我发现是因为权重单位化了以后，梯度的量纲（取值范围）发生了阶跃，因此统计了每个权重的模，求平均，作为一个缩放系数乘以权重矩阵，根据数学原理，并不影响后续的loss，并且能够快速收敛；

3关于margin，也就是算法的核心，需要在训练的时候根据输入的label进行叠加，因此要有条件判断；

4根据算法原理backward过程和inner product同方法。