梯度消失与梯度爆炸

在训练的过程中,突然出现模型的 loss突然增的非常的大,是原来的loss的数百倍甚至无穷大,在这之后,准确率急剧下降,再之后 loss就出现了nan

如下:

---------------epoch: 10---------------
Train Epoch: 10 [0/50000 (0.0%)] Loss: 0.820103
Train Epoch: 10 [6400/50000 (12.8%)] Loss: 0.607676
Train Epoch: 10 [12800/50000 (25.6%)] Loss: 0.547826
Train Epoch: 10 [19200/50000 (38.4%)] Loss: 0.667065
Train Epoch: 10 [25600/50000 (51.2%)] Loss: 0.826129
Train Epoch: 10 [32000/50000 (63.9%)] Loss: 0.608503
Train Epoch: 10 [38400/50000 (76.7%)] Loss: 0.701185
Train Epoch: 10 [44800/50000 (89.5%)] Loss: 0.809276


Test set: Average loss: 0.8979, Accuracy: 6990/10000 (69.9000%)


__is_best:False__best_prec1:76.33__prec1:69.9
save_checkpoint save!!
---------------epoch: 11---------------
Train Epoch: 11 [0/50000 (0.0%)] Loss: 0.603848
Train Epoch: 11 [6400/50000 (12.8%)] Loss: 0.561921
Train Epoch: 11 [12800/50000 (25.6%)] Loss: 0.635683
Train Epoch: 11 [19200/50000 (38.4%)] Loss: 0.642762
Train Epoch: 11 [25600/50000 (51.2%)] Loss: 43692104.000000
Train Epoch: 11 [32000/50000 (63.9%)] Loss: 4073269760.000000
Train Epoch: 11 [38400/50000 (76.7%)] Loss: 496331227136.000000
Train Epoch: 11 [44800/50000 (89.5%)] Loss: 570605830144.000000


Test set: Average loss: 98411573884957.4844, Accuracy: 1000/10000 (10.0000%)


__is_best:False__best_prec1:76.33__prec1:10.0
save_checkpoint save!!
。。。。。。省略。。。。。。。。。。
---------------epoch: 23---------------
Train Epoch: 23 [0/50000 (0.0%)] Loss: 324058578944.000000
Train Epoch: 23 [6400/50000 (12.8%)] Loss: 267765661696.000000
Train Epoch: 23 [12800/50000 (25.6%)] Loss: 2.318092
Train Epoch: 23 [19200/50000 (38.4%)] Loss: 2.236400
Train Epoch: 23 [25600/50000 (51.2%)] Loss: 2.163205
Train Epoch: 23 [32000/50000 (63.9%)] Loss: 2.135540
Train Epoch: 23 [38400/50000 (76.7%)] Loss: 2.363119
Train Epoch: 23 [44800/50000 (89.5%)] Loss: 104550023168.000000


Test set: Average loss: 2.4836, Accuracy: 1000/10000 (10.0000%)


__is_best:False__best_prec1:76.33__prec1:10.0
save_checkpoint save!!
---------------epoch: 24---------------
Train Epoch: 24 [0/50000 (0.0%)] Loss: 6339813179392.000000
Train Epoch: 24 [6400/50000 (12.8%)] Loss: 2.126948
Train Epoch: 24 [12800/50000 (25.6%)] Loss: nan
Warning: NaN or Inf found in input tensor.
Train Epoch: 24 [19200/50000 (38.4%)] Loss: nan
Warning: NaN or Inf found in input tensor.
Train Epoch: 24 [25600/50000 (51.2%)] Loss: nan
Warning: NaN or Inf found in input tensor.
Train Epoch: 24 [32000/50000 (63.9%)] Loss: nan
Warning: NaN or Inf found in input tensor.
Train Epoch: 24 [38400/50000 (76.7%)] Loss: nan
Warning: NaN or Inf found in input tensor.
Train Epoch: 24 [44800/50000 (89.5%)] Loss: nan
Warning: NaN or Inf found in input tensor.


Test set: Average loss: nan, Accuracy: 1000/10000 (10.0000%)


__is_best:False__best_prec1:76.33__prec1:10.0
save_checkpoint save!!

  • 原因
    初始的学习率设置为 0.1, 比较大

  • 解决方式
    设置为 0.01

关于梯度消失或者爆炸其他见解,见参考资料

  • 参考资料
    在pytorch框架下,训练model过程中,loss=nan问题时该怎么解决? - JSLS_Hf的博客 - CSDN博客
    https://blog.csdn.net/JSLS_Hf/article/details/81743045

警惕!损失Loss为Nan或者超级大的原因 - Oldpan的个人博客
https://oldpan.me/archives/careful-train-loss-nan-inf

网络训练loss为nan的解决的办法。 - blackx - 博客园
https://www.cnblogs.com/hypnus-ly/p/9895885.html

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值