RuntimeError: CUDA error: device-side assert triggered
YOLOv8训练时:报错如下:
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
0%| | 0/54 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2650,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2650,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
。。。 。。。
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
原因:
label标签的ID与实际类别数不符。训练数据中存在超出分类数目的标签。举个例子:如果你一共设置了5个类,但是训练数据中的标签里包含[0,1,2,3,4,5],这样的话,标签是6类。就会报这个错误。