GTX 16XX系显卡 yolov5训练结果出现NAN的问题

不含硫jun

已于 2023-08-13 20:45:45 修改

阅读量7.7k

点赞数 10

文章标签：深度学习计算机视觉 pytorch

于 2021-12-23 20:36:37 首次发布

本文链接：https://blog.csdn.net/sun1311523821/article/details/122115663

版权

autoanchor: Analyzing anchors... anchors/target = 4.27, Best Possible Recall (BPR) = 0.9935
Image sizes 640 train, 640 val
Using 1 dataloader workers
Logging results to runs\train\test42
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       0/2     1.86G       nan       nan       nan       113       640: 100%|██████████| 16/16 [00:23<00:00,  1.44s/it]
C:\Users\monst\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|█| 8/8 [00:03<00:00,  2.45
                 all        128          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       1/2     2.45G       nan       nan       nan       128       640: 100%|██████████| 16/16 [00:17<00:00,  1.08s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|█| 8/8 [00:03<00:00,  2.48
                 all        128          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
       2/2     2.45G       nan       nan       nan       221       640: 100%|██████████| 16/16 [00:17<00:00,  1.09s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|█| 8/8 [00:03<00:00,  2.39
                 all        128          0          0          0          0          0

找到了解决方法
https://github.com/ultralytics/yolov5/issues/4839

在这里插入图片描述

https://docs.nvidia.com/deeplearning/cudnn/release-notes/rel_8.html#rel-822

在这里插入图片描述

CUDA
https://developer.nvidia.cn/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64

cudnn
https://developer.nvidia.com/rdp/cudnn-download

安装了CUDA10.2还是不行，看别人pytorch是1.9.1的，我是1.8.1的，就去升级了pytorch(用anaconda升的)，自动升到了1.10.1。
但是升完直接找不到cuda了，torch.version.cuda显示None，torch和cuda版本不匹配。
又在anaconda发现cudatoolkit库还是11.1，没降级，然后去把cudatoolkit从11.1降到了10.2，还是不行。
无奈，最后直接把原来的环境删了，重新装了一个，正好CUDA10.2的，pytorch也是1.10.1。

Pytorch
https://pytorch.org/
conda install pytorch == 1.10.1 torchvision == 0.11.2 torchaudio == 0.10.1 cudatoolkit=10.2 -c pytorch
等号两边的空格去掉，有六个

Anaconda安装pytorch
https://blog.csdn.net/qq_45297730/article/details/121652951

只能说重装解决一切

更多关于NAN的讨论
https://github.com/ultralytics/yolov5/issues/4084
https://github.com/ultralytics/yolov5/issues/1625
https://github.com/ultralytics/yolov5/issues/1749

不含硫jun

关注

10
点赞
踩
42

收藏

觉得还不错? 一键收藏
25
评论
GTX 16XX系显卡 yolov5训练结果出现NAN的问题

autoanchor: Analyzing anchors... anchors/target = 4.27, Best Possible Recall (BPR) = 0.9935Image sizes 640 train, 640 valUsing 1 dataloader workersLogging results to runs\train\test42Starting training for 3 epochs... Epoch gpu_mem box
复制链接

扫一扫