解决YOLOV5出现全为nan和0的问题

 yolov5训练时,出现系数为nan和0的问题。

cpu跑没有问题,gpu出现nan和0的问题。一般问题cuda问题和显卡的原因。

显卡为GTX 16XX系列的在cuda使用较新版本时会出现该问题。

例如我自己的问题:飞行堡垒7锐龙版 显卡:GTX 1650  cuda11.3(cuda11.5调试过)都会出现该问题 pytorch为1.11.0 。

AutoAnchor: 6.13 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs\train\exp7
Starting training for 100 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
      0/99     1.88G       nan       nan       nan        10       640: 100%|██████████| 14/14 [00:35<00:00,  2.52s/it]
D:\19837\anaconda3\envs\pytorch\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 7/7 [00:07<00:00,  1.09s/it]
                 all        106          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
      1/99     1.96G       nan       nan       nan       104       640:   7%|▋         | 1/14 [00:02<00:38,  2.92s/it]
Process finished with exit code -1

解决方案为将cuda换为10.2的版本,链接如下,直接进行下载

CUDA Toolkit Archive | NVIDIA DeveloperPrevious releases of the CUDA Toolkit, GPU Computing SDK, documentation and developer drivers can be found using the links below. Please select the release you want from the list below, and be sure to check www.nvidia.com/drivers for more recent production drivers appropriate for your hardware configuration.https://developer.nvidia.com/cuda-toolkit-archive

cudnn下载:

cuDNN Archive | NVIDIA DeveloperNVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks.https://developer.nvidia.com/rdp/cudnn-archive#a-collapse51b选择对应的版本

安装cuda过后将cudnn里面的放入C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2这个路径下。根据自己的路径进行修改

然后继续安装pytorch cu102版本

pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html

接下来回到运行程序阶段

AutoAnchor: 6.13 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs\train\exp10
Starting training for 100 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
      0/99     1.85G    0.1244    0.0515   0.06827        10       640: 100%|██████████| 14/14 [01:59<00:00,  8.53s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 7/7 [00:11<00:00,  1.62s/it]
                 all        106        433    0.00107    0.00842   0.000487   0.000122

     Epoch   gpu_mem       box       obj       cls    labels  img_size
      1/99     1.96G    0.1171   0.06178   0.06603        63       640:  50%|█████     | 7/14 [00:30<00:30,  4.37s/it]

 至此就完成了

  • 42
    点赞
  • 133
    收藏
    觉得还不错? 一键收藏
  • 48
    评论
评论 48
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值