large margin-人脸识别

最新推荐文章于 2022-08-18 23:13:47 发布

jinggegebuaa

最新推荐文章于 2022-08-18 23:13:47 发布

阅读量1.2k

点赞数 1

本文链接：https://blog.csdn.net/keyanxiaocaicai/article/details/73064976

版权

最近新出的角度间隔使得在lfw 数据库上能达到一个比较好的效果。

重新读了一下large margin 的文章，感觉如出一辙。

思想是一样的，要把两个有重叠的区域，尽可能的分开。

largin margin 的 github 实现：

https://github.com/wy1iu/LargeMargin_Softmax_Loss

根据这个可以实现 iccv 2017 的角度间隔。

paper 地址：http://jmlr.org/proceedings/papers/v48/liud16.pdf

我们现在来看一下作者实现的large margin 里面的这几个参数具体指的是啥意思吧：

layer {
  name: "ip2"
  type: "LargeMarginInnerProduct"
  bottom: "bn_ip"
  bottom: "label"
  top: "ip2"
  top: "lambda"
  param {
    name: "ip2"
    lr_mult: 1
  }
  largemargin_inner_product_param {
    num_output: 10            // mnist 一共有十类
    type: QUADRUPLE       
    base: 1000
    gamma: 0.000025
    power: 35
    iteration: 0
    lambda_min: 0  
    weight_filler {
      type: "msra"
    }
  }
  include {
    phase: TRAIN
  }
}

作者对这几个参数的解释：

L-Softmax loss is the combination of "LargeMarginInnerProduct" layer and "SoftmaxWithLoss" layer.
L-Softmax loss 是两个函数的组合
If the type of the layer is SINGLE/DOUBLE/TRIPLE/QUADRUPLE, then m is set as 1/2/3/4 respectively.
mnist example can be run directly after compilation. cifar10 and cifar10+ requires datasets to be downloaded first.
base, gamma, power and lambda_min are parameters for exponential lambda descent.
lambda represents the approximation level to the proposed L-Softmax loss (refer to the experimental details in the ICML'16 paper). lambda will be decreased by the equation: lambda = max(lambda_min,base*(1+gamma*iteration)^(-power)).
It is strong recommended that the user visualizes the lambda descent function before using the loss. The parameter selection is very flexible. Typically, when the optimization is finished, lambda should a sufficiently small value. Also note that, lambda is not always necessary. For MNIST dataset, the L-Softmax loss can work perfectly without lambda. Setting base to 0 can remove the lambda.
lambda_min can vary according to the difficulty of datasets. For easy datasets such as mnist and cifar10, lambda_min can be zero. For large and difficult datasets, you should first try to set lambda_min as 5 or 10. There is no specific rule to set lambda_min, but generally, it should be as small as possible.
Both ReLU and PReLU work well with L-Softmax loss. Empirically, PReLU helps L-Softmax converge easier.
Batch normalization could help the L-Softmax network converge much easier. It is strong recommended to use it.