1. yolov8n在train时,出现box_loss、cls_loss、dfl_loss为nan
解决办法:
model.train设置amp=False
args = dict(
model='cfg/models/conv/xx.yaml',
data='cfg/datasets/xx.yaml',
imgsz=640,
epochs=300,
batch=8,
workers=0,
device=0,
optimizer='SGD', # 这里可以使用两个优化器SGD 和AdamW,其它的可能会导致模型无法收敛
amp=False, # 关掉amp,也就是让amp = False
)
2. yolov8n在train时,出现Box(P R mAP50 mAP50-95)为0 的问题或者train可以,但是到输出Box(P R mAP50 mAP50-95)直接报错
解决办法:
修改ultralytics/yolo/cfg/default.yaml 第49行 half 为 False
以及注释掉ultralytics/yolo/engine/validator.py # self.args.half = self.device.type != 'cpu' 将self.args.half的值设置为False,或者直接去掉就行,因为half已经改成Falsel了
self.training = trainer is not None
if self.training:
self.device = trainer.device
self.data = trainer.data
model = trainer.ema.ema or trainer.model
#self.args.half = self.device.type != 'cpu' # force FP16 val during training
self.args.half = False
self.model = model
self.loss = torch.zeros_like(trainer.loss_items, device=trainer.device)
self.args.plots = trainer.stopper.possible_stop or (trainer.epoch == trainer.epochs - 1)
model.eval()
也有文章说是NVIDIA对GTX16xx相关CUDA包有问题,把每个地方.half()改为.float()或者把half赋值为False,找到val.py和validator.py, 查看self.args.half的值,都改成False