最近新出的 角度间隔使得在lfw 数据库上能达到一个比较好的效果。
重新读了一下large margin 的文章,感觉如出一辙。
思想是一样的,要把两个有重叠的区域,尽可能的分开。
largin margin 的 github 实现:
https://github.com/wy1iu/LargeMargin_Softmax_Loss
根据这个可以实现 iccv 2017 的角度间隔。
paper 地址:http://jmlr.org/proceedings/papers/v48/liud16.pdf
我们现在来看一下 作者实现的large margin 里面的这几个参数具体指的是啥意思吧:
layer {
name: "ip2"
type: "LargeMarginInnerProduct"
bottom: "bn_ip"
bottom: "label"
top: "ip2"
top: "lambda"
param {
name: "ip2"
lr_mult: 1
}
largemargin_inner_product_param {
num_output: 10 // mnist 一共有十类
type: QUADRUPLE
base: 1000
gamma: 0.000025
power: 35
iteration: 0
lambda_min: 0
weight_filler {
type: "msra"
}
}
include {
phase: TRAIN
}
}
作者对这几个参数的解释:
- L-Softmax loss is the combination of "LargeMarginInnerProduct" layer and "SoftmaxWithLoss" layer.
- L-Softmax loss 是两个函数的组合
- If the type of the layer is SINGLE/DOUBLE/TRIPLE/QUADRUPLE, then m is set as 1/2/3/4 respectively.
- mnist example can be run directly after compilation. cifar10 and cifar10+ requires datasets to be downloaded first.
- base, gamma, power and lambda_min are parameters for exponential lambda descent.
- lambda represents the approximation level to the proposed L-Softmax loss (refer to the experimental details in the ICML'16 paper). lambda will be decreased by the equation: lambda = max(lambda_min,base*(1+gamma*iteration)^(-power)).
- It is strong recommended that the user visualizes the lambda descent function before using the loss. The parameter selection is very flexible. Typically, when the optimization is finished, lambda should a sufficiently small value. Also note that, lambda is not always necessary. For MNIST dataset, the L-Softmax loss can work perfectly without lambda. Setting base to 0 can remove the lambda.
- lambda_min can vary according to the difficulty of datasets. For easy datasets such as mnist and cifar10, lambda_min can be zero. For large and difficult datasets, you should first try to set lambda_min as 5 or 10. There is no specific rule to set lambda_min, but generally, it should be as small as possible.
- Both ReLU and PReLU work well with L-Softmax loss. Empirically, PReLU helps L-Softmax converge easier.
- Batch normalization could help the L-Softmax network converge much easier. It is strong recommended to use it.