转自:http://blog.csdn.net/cham_3/article/details/53213033
这两天在caffe跑网络,原本的lr_policy:”fixed”,后面更改了策略,使用lr_policy:”step”。但是却出现了如下错误:
*** Aborted at 1479432790 (unix time) try "date -d @1479432790" if you are using GNU date ***
PC: @ 0x7fe47645db63 caffe::SGDSolver<>::GetLearningRate()
*** SIGFPE (@0x7fe47645db63) received by PID 13998 (TID 0x7fe476dca780) from PID 1984289635; stack trace: ***
@ 0x7fe47582c2f0 (unknown)
@ 0x7fe47645db63 caffe::SGDSolver<>::GetLearningRate()
@ 0x7fe47645dd72 caffe::SGDSolver<>::ApplyUpdate()
@ 0x7fe47646949f caffe::Solver<>::Step()
@ 0x7fe47646981f caffe::Solver<>::Solve()
@ 0x407471 train()
@ 0x404bcb main
@ 0x7fe475817a40 (unknown)
@ 0x405219 _start
@ 0x0 (unknown)
Floating point exception (core dumped)
博主找了网上好多答案,都与自己的情况不符(每次遇到都是别人没遇到的也是醉了)。花了两天时间,不断地重新make clean ;make all都没有解决。后来回头去看改过的地方,发现居然是少加了stepsize:10000。由于未指定stepsize导致计算learning rate时浮点数格式异常。
只能说自己太粗心了,吃一堑长一智,以后得多注意。