利用caffe第三方实现的combined margin_layer进行训练(https://github.com/gehaocool/CombinedMargin-caffe),数据集采用VGGFace2,经过筛选以后共有8631个人的244W+的人脸图片。
训练过程中出现了一下问题,这里进行记录。
一、利用开源项目中的res36-E网络进行训练
训练过程比较顺利,虽然为了尽快训练完,将epoch进行了精简,但是最终的loss也下降到2左右,和项目给出的loss基本相同,训练过程中分类的精度也稳定在0.7-0.8之间。
下图为训练过程中的loss和accuracy的曲线,横坐标的单位需要再乘以50(log文件中50次迭代输出一次log)。
I1222 12:24:23.999287 2126 solver.cpp:228] Iteration 277750, loss = 2.11202
I1222 12:24:23.999418 2126 solver.cpp:244] Train net output #0: accuracy = 0.734375
I1222 12:24:23.999433 2126 solver.cpp:244] Train net output #1: softmax_loss = 1.72783 (* 1 = 1.72783 loss)
I1222 12:24:23.999442 2126 sgd_solver.cpp:106] Iteration 277750, lr = 1e-05
I1222 12:25:15.731434 2126 solver.cpp:228] Iteration 277800, loss = 1.79811
I1222 12:25:15.731550 2126 solver.cpp:244] Train net output #0: accuracy = 0.78125
I1222 12:25:15.731564 2126 solver.cpp:244] Train net output #1: softmax_loss = 1.98719 (* 1 = 1.98719 loss)
I1222 12:25:15.731572 2126 sgd_solver.cpp:106] Iteration 277800, lr = 1e-05
I1222 12:26:07.457556 2126 solver.cpp:228] Iteration 277850, loss = 2.63742
I1222 12:26:07.457665 2126 solver.cpp:244] Train net output #0: accuracy = 0.65625
I1222 12:26:07.457679 2126 solver.cpp:244] Train net output #1: softmax_loss = 2.20569 (* 1 = 2.20569 loss)
I1222 12:26:07.457686 2126 sgd_solver.cpp:106] Iteration 277850, lr = 1e-05
I1222 12:26:59.228655 2126 solver.cpp:228] Iterati