运行YOLOv5 6.1和yolor的时候,训练都没能正常运行,均出现了如下错误:
AutoAnchor: 5.00 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs\train\exp14
Starting training for 300 epochs...
Epoch gpu_mem box obj cls labels img_size
0%| | 0/8 [00:03<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 634, in
main(opt)
File "train.py", line 525, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 320, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "E:Deeplearn\yolov5\utils\loss.py", line 120, in call
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
File "E:Deeplearn\yolov5\utils\loss.py", line 217, in build_targets
indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
RuntimeError: result type Float can't be cast to the desired output type __int64
在网络上搜索错误,没有找到解决方案,只能尝试自己分析错误原因。
由于学校的电脑可以正常运行,家中电脑就出现了这个问题,两台电脑的环境差别只有pytorch版本不同,故合理怀疑是Pytorch 1.12版本的问题,将pytorch降级为1.11版本,再次运行后问题解决
PS:pytorch降级办法:
用anaconda管理环境的话,直接输入官方上提供的安装1.11的指令,就可以,由于1.12支持11.6版本的cuda,所以很多人因为cuda版本是11.6所以安装的pytorch1.12(比如我),所以降级的时候一定要注意自己的cuda版本与1.11要求的匹配
# CUDA 10.2
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
# CPU Only
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cpuonly -c pytorch