GTX 16XX系显卡 yolov5训练结果出现NAN的问题

autoanchor: Analyzing anchors... anchors/target = 4.27, Best Possible Recall (BPR) = 0.9935
Image sizes 640 train, 640 val
Using 1 dataloader workers
Logging results to runs\train\test42
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       0/2     1.86G       nan       nan       nan       113       640: 100%|██████████| 16/16 [00:23<00:00,  1.44s/it]
C:\Users\monst\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|| 8/8 [00:03<00:00,  2.45
                 all        128          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       1/2     2.45G       nan       nan       nan       128       640: 100%|██████████| 16/16 [00:17<00:00,  1.08s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|| 8/8 [00:03<00:00,  2.48
                 all        128          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       2/2     2.45G       nan       nan       nan       221       640: 100%|██████████| 16/16 [00:17<00:00,  1.09s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|| 8/8 [00:03<00:00,  2.39
                 all        128          0          0          0          0          0

找到了解决方法
https://github.com/ultralytics/yolov5/issues/4839

在这里插入图片描述

https://docs.nvidia.com/deeplearning/cudnn/release-notes/rel_8.html#rel-822

在这里插入图片描述

CUDA
https://developer.nvidia.cn/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64
在这里插入图片描述

cudnn
https://developer.nvidia.com/rdp/cudnn-download在这里插入图片描述

安装了CUDA10.2还是不行,看别人pytorch是1.9.1的,我是1.8.1的,就去升级了pytorch(用anaconda升的),自动升到了1.10.1。
但是升完直接找不到cuda了,torch.version.cuda显示None,torch和cuda版本不匹配。
又在anaconda发现cudatoolkit库还是11.1,没降级,然后去把cudatoolkit从11.1降到了10.2,还是不行。
无奈,最后直接把原来的环境删了,重新装了一个,正好CUDA10.2的,pytorch也是1.10.1。

Pytorch
https://pytorch.org/
conda install pytorch == 1.10.1 torchvision == 0.11.2 torchaudio == 0.10.1 cudatoolkit=10.2 -c pytorch
等号两边的空格去掉,有六个在这里插入图片描述

Anaconda安装pytorch
https://blog.csdn.net/qq_45297730/article/details/121652951

只能说重装解决一切

更多关于NAN的讨论
https://github.com/ultralytics/yolov5/issues/4084
https://github.com/ultralytics/yolov5/issues/1625
https://github.com/ultralytics/yolov5/issues/1749

  • 10
    点赞
  • 42
    收藏
    觉得还不错? 一键收藏
  • 25
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 25
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值