mmsegmentation CUDA kernel errors might be asynchronously reported at some other API call

qq_23968017

已于 2023-04-01 10:40:57 修改

阅读量690

点赞数

文章标签： python 深度学习

于 2023-03-26 20:19:40 首次发布

本文链接：https://blog.csdn.net/qq_23968017/article/details/129783831

版权

mmsegmentation adk20数据集或VOC数据集时，出现错误<1>：
correct = correct[:, target != ignore_index]
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

原因1【核心】：mmsegmentation gray mask label 问题
处理数据集的mask RGB label为单通道mmsegmentation gray mask label要求
1=+=处理mask RGB label为单通道mmsegmentation gray mask label要求
注意：标注是跟图像同样的形状 (H, W)，其中的像素值的范围是 [0, num_classes - 1]。
2=+=语义分割，将RGB三通道的lable转为单通道
代码参考链接
cv2读取彩色图片通道顺序为B、G、R，PIL显示图片是R、G、B顺序
注意原始label mask RGB色彩==>BGR色彩【特别注意，否则验证仅有0值】

		cmap = np.array(
		        [   (0, 0, 0), 
		            (181, 119, 53),
		 # 读取 RGB (53, 119, 181) ==> cv2 ：BGR (181, 119, 53)   
		        ] )

3=+=验证转换后的像素值
np.unique(img) # 找到不重复像素值

原因2：类别num_classes 错误
分类数目和模型里的实际分类数目不匹配：
VOC 数据集 num_classes+1
dict(type=‘LoadAnnotations’, reduce_zero_label=False)
ade20数据集 num_classes
dict(type=‘LoadAnnotations’, reduce_zero_label=True)

出现错误<2>
【问题1】：ValueError: size shape must match input shape. Input is 2D, size is 3
1=+=正确做法：
数据集RGB mask =转换= mmseg gray mask 【参考上面原因修改】
2=+=错误做法<导致训练分割结果为nan>：
错误代码修改做法
位置：mmseg\datasets\pipelines\loading.py 【无需修改】

	flag='unchanged' ==> flag='grayscale' 【错误做法】
	gt_semantic_seg = mmcv.imfrombytes(
	        img_bytes, flag='grayscale',
	        backend=self.imdecode_backend).squeeze().astype(np.uint8)