mmsegmentation修仙之路-bug篇（3）

石头变钻石？

已于 2023-06-14 16:32:56 修改

阅读量2.6k

点赞数 6

分类专栏： # mmsegmentation 文章标签： bug python 开发语言

于 2023-06-13 11:22:35 首次发布

本文链接：https://blog.csdn.net/stone_tigerli/article/details/130806118

版权

mmsegmentation 专栏收录该内容

3 篇文章 2 订阅

订阅专栏

合集目录

ValueError: expected 4D input (got 3D input)

这个是在训练swin-t主干网络时遇到的问题，原因是使用了BatchNorm函数。
解决的方法就是不需要在模型的backbone添加 ‘norm_cfg’。

AttributeError: class ‘EncoderDecoder’ in mmseg/models/segmentors/encoder_decoder.py: class ‘Mask2FormerHead’ in mmseg/models/decode_heads/mask2former_head.py: ‘ConfigDict’ object has no attribute ‘transformerlayers’

参考：https://github.com/open-mmlab/mmsegmentation/issues/2619
解决办法：
应该是版本问题，下载mmdet dev-3.x的代码并安装可以解决，直接mim install mmdet >=3.0.0rc5，不行。

RuntimeError: DataLoader worker (pid 449) is killed by signal: Killed.

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1011, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/root/miniconda3/lib/python3.8/multiprocessing/queues.py", line 107, in get
    if not self._poll(timeout):
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/root/miniconda3/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 449) is killed by signal: Killed.

问题描述：没有GPU却调用了GPU。

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list.

原因分析：
报错显示size不匹配，向上看报错提到torch.cat([upsample_feat, feat_low], 1))，说明是upsample_feat,和feat_low的维度不匹配造成的。现在的网络很多都会融合多尺度信息，这样下采样或者上采样后，取整的方式可能存在差异，导致高低维特征融合时存在不匹配现象。
解决办法：
1、将训练图像resize为32的整数倍，即可避免出现取整，从而避免问题发生。（推荐）
2、将cat中特征resize至相同大小。

AssertionError: Initialize dataset with 'reduce_zero_label`'as False but when load annotation the ‘reduce_zero_label’ is True

原因分析：
我这里是因为用的新版本，然后偷懒没有定义自己数据类型，在config中直接使用的basesegdataset基类，这就导致数据读入时会首先通过这个基类来获得reduce_zero_label的参数，然后再通过pipeline进行数据处理，这时如果pipeline中设置了reduce_zero_label=True就会出现报错，因为basesegdataset基类中这个参数默认是false。下面是basesegdataset类的init函数。

    def __init__(self,
                 ann_file: str = '',
                 img_suffix='.jpg',
					、、、、、
                 ignore_index: int = 255,
                 reduce_zero_label: bool = False,
                 backend_args: Optional[dict] = None) -> None:

在loading时，会进行判断，源码如下：

        if self.reduce_zero_label is None:
            self.reduce_zero_label = results['reduce_zero_label']
        assert self.reduce_zero_label == results['reduce_zero_label'], \
            'Initialize dataset with `reduce_zero_label` as ' \
            f'{results["reduce_zero_label"]} but when load annotation ' \
            f'the `reduce_zero_label` is {self.reduce_zero_label}'

解决办法：
共有三种解决办法，选其一即可。
1、不要偷懒，定义自己数据集类并注册，在init函数中加入reduce_zero_label=True。
2、不要偷懒，在解码头中num_classes的类别数目加上背景，即+1。
3、在config文件的train_dataloader和val_dataloader下的dataset中添加reduce_zero_label=True。

train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='InfiniteSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        metainfo=metainfo,
        reduce_zero_label=True,  # 这里添加，val_dataloader也是一样的位置添加
        data_prefix=dict(
            img_path='img_dir/train', seg_map_path='ann_dir/train'),
        pipeline=train_pipeline))