(py 3.6 cuda 11.3 torch1.10.2 -> py 3.9 cuda 10.2 torch 1.9.0)
之前下载的pytorch版本是直接从Start Locally | PyTorch 中的start locally选择的1.10.2中下载的cuda版本11.3的指令,如下图.
训练出的模型中results.png显示的像precision,mAP之类的图表全部不是0就是nan,混淆矩阵部分也全是FN.同时报错
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Previous PyTorch Versions | PyTorch 之后我选择下载历史版本1.9.0,pip安装指令如下:
pip install torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0
(其实这个是OSX的版本)数据恢复正常.但当我使用print(torch.version.cuda) 输出cuda版本时显示为10.2,我使用nvcc -V查询cuda版本时显示为11.3,usr/local中也不存在10.2的文件,但是还能用,奇奇怪怪.第二天报错:CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.然后就寄掉惹.
使用pip安装,pytorch版本1.9.0,cuda11.3,报错UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:115.)
return torch._C._cuda_getDeviceCount() > 0
此时数据正常,gpu_mem显示为0.
最后cuda10.2+pytorch1.9.0安装,发现import torch报错,将python版本切换为3.9,解决问题!
自用的检验方法:
import torch
print(torch.__version__) # PyTorch version
import torchvision
print(torchvision.__version__)
print(torch.version.cuda) # Corresponding CUDA version
print(torch.backends.cudnn.version()) # Corresponding cuDNN version
print(torch.cuda.get_device_name(0)) # GPU type
yolov5训练(train)的时候 P R 值为0_m0_59080342的博客-CSDN博客 同样的情况
YOLOv5目标检测 - 迷途小书童的Note迷途小书童的Note 配置时使用的教程