YOLOv10训练预测时碰到的问题
训练参数及环境
- cuda_version,pytorch官网下的:12.1
- python version,用anaconda创建的虚拟环境:3.9.19
- GPU:Nvidia1650 laptop
- cmd 输入nvidia-smi显示: Driver Version: 555.85 ---- CUDA Version: 12.5
- train.py
from ultralytics import YOLOv10
if __name__ == "__main__":
pth_path = r"F:\PY\Yolov10\runs\detect\train_v1030\weights\best.pt"
test_path = r"F:\PY\Yolov10\predicImage"
model = YOLOv10(pth_path) # load a custom model
print(YOLOv10.__bases__)
metrics = model.predict(test_path,conf=0.5,save=True)
1.yolov10训练过程中cls_loss、box_loss等出现nan值
使用命令行进行的训练,命令行如下:
yolo detect train data=self.yaml model=yolov10n.yaml epochs=10 batch=16 imgsz=640 device=0
直接上图:
这样会导致你训练出来的结果直接作废,解决办法如下:
- 找到ultralytics\cfg\default.yaml文件
- 将其中的amp=True改为False
如此修改后,在使用命令行进行训练时模型仍然会出现上述问题,至少在我的环境下是如此,因此我将训练改为使用train.py进行训练,模型顺利地进行训练。但是修改amp会导致什么呢
amp: False # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check
如注释所说AMP是开启混合精度训练
- amp=True:启用自动混合精度训练,YOLO 会在训练过程中自动选择使用 16 位和 32 位浮点数,以提高训练速度和减少显存消耗。
- amp=False:禁用自动混合精度训练,整个训练过程将只使用 32 位浮点数,提供较高的精度但可能导致更高的显存占用和较慢的计算速度。
由此看出将其改为False并会导致降低训练速度和提高显存消耗,但对训练结果等并不会产生太大影响。
2.在预测中 使用ckpt = torch.load(file, map_location=“cpu”)会出现警告
警告如下:
F:\YOLOV10-predict\ultralytics\nn\tasks.py:755: FutureWarning: You are using torch.load
with weights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only
will be flipped to True
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals
. We recommend you start setting weights_only=True
for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt = torch.load(file, map_location=“cpu”)
解决办法如下:
- 定位到该语句(找到ultralytics\nn\tasks.py:755)
- 按照它所提示的将其修改为
ckpt = torch.load(file, map_location="cpu",weights_only=True)
- 仍然会报如下错误:
(1) Re-runningtorch.load
withweights_only
set toFalse
will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load withweights_only=True
please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL ultralytics.nn.tasks.YOLOv10DetectionModel was not an allowed global by default. Please usetorch.serialization.add_safe_globals([YOLOv10DetectionModel])
to allowlist this global if you trust this class/function. - 将刚刚修改处上面添加如下代码
from torch.nn import Sequential
from torch.nn.modules.conv import Conv2d
from torch.nn.modules.batchnorm import BatchNorm2d
from torch.nn.modules.activation import SiLU
from torch.nn.modules.container import ModuleList
from torch.nn.modules.linear import Identity
from ultralytics.nn.modules.block import Bottleneck
from ultralytics.nn.modules.block import CIB
from torch.nn.modules.pooling import MaxPool2d
from ultralytics.nn.modules.block import Attention
from torch.nn import Upsample
torch.serialization.add_safe_globals([YOLOv10DetectionModel,
set,Sequential,Conv,Conv2d,BatchNorm2d,SiLU,C2f,ModuleList,Bottleneck,
SCDown,Identity,C2fCIB,CIB,RepVGGDW,SPPF,MaxPool2d,PSA,Attention,Upsample,
Concat,v10Detect,DFL,IterableSimpleNamespace,v10DetectLoss,v8DetectionLoss,
BCEWithLogitsLoss,TaskAlignedAssigner,BboxLoss])
ckpt = torch.load(file, map_location="cpu",weights_only=True)
这样这个警告就不会出现了。具体的出现原因如下:
这个警告信息提醒你当前正在使用 weights_only=False,即默认加载模型时使用了完整的 pickle 模块进行反序列化。这种行为可能存在安全风险,特别是当你加载不受信任的模型文件时,因为 pickle 允许在反序列化过程中执行任意代码。
在未来的 PyTorch 版本中,torch.load 的默认行为将会更改为 weights_only=True,即仅加载模型的权重,而不加载整个模型对象。这将限制 pickle 的功能,从而减少安全风险。只有明确在 torch.serialization.add_safe_globals 中允许的对象才能被加载。
3.在训练过程中出现torch版本过高的警告
采用下面命令行进行的训练
yolo detect train data=self.yaml model=yolov10n.yaml epochs=10 batch=16 imgsz=640 device=0
警告如下:
WARNING ⚠️ Known issue with torch>=2.4.0 on Windows with CPU, recommend downgrading to torch<=2.3.1 to resolve https://github.com/ultralytics/ultralytics/issues/15049
根据该警告信息可以知道,我当前的torch版本需要降低到2.3.1以下,不过我不明白的是with CPU,我训练命令明明是device=0,指定的是GPU呀
该文章仍在更新中……