mmsegmentation 报错

Describe the bug
2022-03-01 07:52:37,584 - mmseg - INFO - workflow: [('train', 1)], max: 20000 iters
2022-03-01 07:52:37,585 - mmseg - INFO - Checkpoints will be saved to /data/mmsegmentation/work_dirs/fcn_r50-d8_512x512_20k_voc12aug by HardDiskBackend.
/opt/conda/envs/seg/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "tools/train.py", line 234, in
main()
File "tools/train.py", line 223, in main
train_segmentor(
File "/data/mmsegmentation/mmseg/apis/train.py", line 174, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/data/mmcv/mmcv/runner/iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "/data/mmcv/mmcv/runner/iter_based_runner.py", line 61, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/data/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/data/mmsegmentation/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File "/opt/conda/envs/seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data/mmcv/mmcv/runner/fp16_utils.py", line 109, in new_func
return old_func(*args, **kwargs)
File "/data/mmsegmentation/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/data/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 143, in forward_train
loss_decode = self._decode_head_forward_train(x, img_metas,
File "/data/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 86, in _decode_head_forward_train
loss_decode = self.decode_head.forward_train(x, img_metas,
File "/data/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 204, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/data/mmcv/mmcv/runner/fp16_utils.py", line 197, in new_func
return old_func(*args, *kwargs)
File "/data/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 264, in losses
loss['acc_seg'] = accuracy(
File "/data/mmsegmentation/mmseg/models/losses/accuracy.py", line 47, in accuracy
correct = correct[:, target != ignore_index]
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f57296eea22 in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: + 0x10983 (0x7f572994f983 in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x1a7 (0x7f5729951027 in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f57296d85a4 in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0xa27e1a (0x7f56d4a16e1a in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0xa27eb1 (0x7f56d4a16eb1 in /opt/conda/envs/seg/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

frame #23: __libc_start_main + 0xe7 (0x7f573a7ffb97 in /lib/x86_64-linux-gnu/libc.so.6)

使用mmseg工程进行VOC数据集训练的时候,发现报了上述错误

RuntimeError: CUDA error: an illegal memory access was encountered

correct = correct[:, target != ignore_index]

因此,我在外部使用python进行的算法上的测试,重新生成了true和false的tensor,然后添加判断条件,发现并没有什么问题,说明不是cpu和gpu方面的问题,之后回到github的链接下,发现有人出现了同样的问题,是由于num_class没有修改,因此需要修改num_class的数值,可以正常开始算法模型的训练。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
YOLO系列是基于深度学习的端到端实时目标检测方法。 PyTorch版的YOLOv5轻量而性能高,更加灵活和易用,当前非常流行。 本课程将手把手地教大家使用labelImg标注和使用YOLOv5训练自己的数据集。课程实战分为两个项目:单目标检测(足球目标检测)和多目标检测(足球和梅西同时检测)。 本课程的YOLOv5使用ultralytics/yolov5,在Windows系统上做项目演示。包括:安装YOLOv5、标注自己的数据集、准备自己的数据集、修改配置文件、使用wandb训练可视化工具、训练自己的数据集、测试训练出的网络模型和性能统计。 希望学习Ubuntu上演示的同学,请前往 《YOLOv5(PyTorch)实战:训练自己的数据集(Ubuntu)》课程链接:https://edu.csdn.net/course/detail/30793  本人推出了有关YOLOv5目标检测的系列课程。请持续关注该系列的其它视频课程,包括:《YOLOv5(PyTorch)目标检测实战:训练自己的数据集》Ubuntu系统 https://edu.csdn.net/course/detail/30793Windows系统 https://edu.csdn.net/course/detail/30923《YOLOv5(PyTorch)目标检测:原理与源码解析》课程链接:https://edu.csdn.net/course/detail/31428《YOLOv5目标检测实战:Flask Web部署》课程链接:https://edu.csdn.net/course/detail/31087《YOLOv5(PyTorch)目标检测实战:TensorRT加速部署》课程链接:https://edu.csdn.net/course/detail/32303《YOLOv5目标检测实战:Jetson Nano部署》课程链接:https://edu.csdn.net/course/detail/32451《YOLOv5+DeepSORT多目标跟踪与计数精讲》课程链接:https://edu.csdn.net/course/detail/32669《YOLOv5实战口罩佩戴检测》课程链接:https://edu.csdn.net/course/detail/32744《YOLOv5实战中国交通标志识别》课程链接:https://edu.csdn.net/course/detail/35209《YOLOv5实战垃圾分类目标检测》课程链接:https://edu.csdn.net/course/detail/35284       
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值