目录
1.1get_current_batch(net):第几次反向传播,更新网络权重
1.2(*net.seen)/ train_images_num,相当于epoch,整个训练集被训练的次数
更多darknet文章:darknet学习笔记
这本质上,是一个学习率该如何设置的问题?
1.train指令输出都是什么含义:
classifier.c中void train_classifier(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear, int dont_show, int mjpeg_port, int calc_topk)函数中打印的结果
printf("%d, %.3f: %f, %f avg, %f rate, %lf seconds, %ld images\n", get_current_batch(net), (float)(*net.seen)/ train_images_num, loss, avg_loss, get_current_rate(net), sec(clock()-time), *net.seen);
1.1get_current_batch(net):第几次反向传播,更新网络权重
int get_current_batch(network net)
{
int batch_num = (*net.seen)/(net.batch*net.subdivisions);
return batch_num;
}
net.seen:
epoch:1个epoch等于使用训练集中的全部样本训练一次,通俗的讲epoch的值就是整个数据集被轮几次
假设训练图片总张数为num,那么net.seen = epoch*num;
batch_num:
batch_num: 深度学习每一次参数的更新所需要损失函数并不是由一个数据获得的,而是由一组数据加权得到的,这一组数据的数量就是batchsize。其中,batch和subdivisions都是cfg文件的配置参数。
1.2(*net.seen)/ train_images_num,相当于epoch,整个训练集被训练的次数
train_images_num:训练图片总张数
1.3 loss
float loss = 0;
#ifdef GPU
if(ngpus == 1){
loss = train_network(net, train);
} else {
loss = train_networks(nets, ngpus, train, 4);
}
#else
loss = train_network(net, train);
#endif
1.4 avg_losss
if(avg_loss == -1 || isnan(avg_loss) || isinf(avg_loss)) avg_loss = loss;
avg_loss = avg_loss*.9 + loss*.1;
1.5rate
可以看到rate由get_current_rate(net)得到。
float get_current_rate(network net)
{
int batch_num = get_current_batch(net);
int i;
float rate;
if (batch_num < net.burn_in) return net.learning_rate * pow((float)batch_num / net.burn_in, net.power);
switch (net.policy) {
case CONSTANT:
return net.learning_rate;
case STEP:
return net.learning_rate * pow(net.scale, batch_num/net.step);
case STEPS:
rate = net.learning_rate;
for(i = 0; i < net.num_steps; ++i){
if(net.steps[i] > batch_num) return rate;
rate *= net.scales[i];
//if(net.steps[i] > batch_num - 1 && net.scales[i] > 1) reset_momentum(net);
}
return rate;
case EXP:
return net.learning_rate * pow(net.gamma, batch_num);
case POLY:
return net.learning_rate * pow(1 - (float)batch_num / net.max_batches, net.power);
//if (batch_num < net.burn_in) return net.learning_rate * pow((float)batch_num / net.burn_in, net.power);
//return net.learning_rate * pow(1 - (float)batch_num / net.max_batches, net.power);
case RANDOM:
return net.learning_rate * pow(rand_uniform(0,1), net.power);
case SIG:
return net.learning_rate * (1./(1.+exp(net.gamma*(batch_num - net.step))));
case SGDR:
{
int last_iteration_start = 0;
int cycle_size = net.batches_per_cycle;
while ((last_iteration_start + cycle_size) < batch_num)
{
last_iteration_start += cycle_size;
cycle_size *= net.batches_cycle_mult;
}
rate = net.learning_rate_min +
0.5*(net.learning_rate - net.learning_rate_min)
* (1. + cos((float)(batch_num - last_iteration_start)*3.14159265 / cycle_size));
return rate;
}
default:
fprintf(stderr, "Policy is weird!\n");
return net.learning_rate;
}
}
net.burn_in,cfg文件的可以配置的一个参数,如果cfg文件中没有设置该参数,默认值为0
★ 学习率决定着权值更新的速度,设置得太大会使结果超过最优值,太小会使下降速度过慢。如果仅靠人为干预调整参数,需要不断修改学习率。刚开始训练时可以将学习率设置的高一点,而一定轮数之后,将其减小在训练过程中,一般根据训练轮数设置动态变化的学习率。刚开始训练时:学习率以 0.01 ~ 0.001 为宜。一定轮数过后:逐渐减缓
接近训练结束:学习速率的衰减应该在100倍以上。
学习率的调整参考https://blog.csdn.net/qq_33485434/article/details/80452941
★★★ 学习率调整一定不要太死,实际训练过程中根据loss的变化和其他指标动态调整,手动ctrl+c结束此次训练后,修改学习率,再加载刚才保存的模型继续训练即可完成手动调参,调整的依据是根据训练日志来,如果loss波动太大,说明学习率过大,适当减小,变为1/5,1/10均可,如果loss几乎不变,可能网络已经收敛或者陷入了局部极小,此时可以适当增大学习率,注意每次调整学习率后一定要训练久一点,充分观察,调参是个细活,慢慢琢磨
★★ 一点小说明:实际学习率与GPU的个数有关,例如你的学习率设置为0.001,如果你有4块GPU,那真实学习率为0.001*4
nets[i].learning_rate *= ngpus;
learning_rate=0.001
★ 在迭代次数小于burn_in时,其学习率的更新有一种方式,大于burn_in时,才采用policy的更新方式
绘制学习率曲线,matlab代码如下:
maxbatches = 700000;
rate = 0.01;
x = 300019:maxbatches;
y=1-(x/maxbatches);
k=y.^4;
z= rate*k;
plot(x,z)
1.当前batch=300019,rate=0.01,max_batches=392480
learning_rate=0.01
policy=poly
power=4
max_batches=392480
2.当前batch=300019,rate=0.01,max_batches=592480
learning_rate=0.01
policy=poly
power=4
max_batches=592480
3.当前batch=300019,rate=0.02,max_batches=592480
learning_rate=0.02
policy=poly
power=4
max_batches=592480
4.当前batch=300019,rate=0.01,max_batches=700000
learning_rate=0.01
policy=poly
power=4
max_batches=700000
设置学习率和学习策略以及maxbatches时最好,多设几个参数,绘制出学习率曲线。
从学习率曲线可以看出,当学习率相关参数设置好之后,batch_num取值大于某个值时,学习率就会降为0。
2019年11月1日,700000次迭代训练完成,重新修改参数,训练模型
maxbatches = 1000000;
rate = 0.02;
x = 700000:maxbatches;
y=1-(x/maxbatches);
k=y.^4;
z= rate*k;
plot(x,z)
设置最大matches=1000000
图形如下:
1.6seconds,当前batch_num训练时间
1.7当前总共训练了多少张图片