【MMSegmentation 环境配置 + 制作自己的数据集 + 训练 + 预测】

一、创建Window平台环境

1. 创建python 环境

conda create --name openmmlab2 python=3.8 -y
conda activate openmmlab2  // 激活环境

2. 安装pytorch

torch官方网站
访问torch网站,选择适合自己的pytorch版本,由于我这里使用的cuda是11.6,所以我安装以下版本,版本不对应,后面会有各种报错。
torch旧版本地址

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
感觉慢的可以通过清华镜像离线下载进行安装,
下载的时候下载对应的torch,torchvision,torchaudio
download.pytorch.org/whl/torch_stable.html

3. 安装MMCV

pip install -U openmim -i https://pypi.tuna.tsinghua.edu.cn/simple
mim install mmengine -i https://pypi.tuna.tsinghua.edu.cn/simple
mim install "mmcv>=2.0.0"

4. 安装 MMSegmentation.

D:
mkdir mmseg
cd mmseg
conda install git 
git clone -b main https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
pip install -v -e .
# '-v' means verbose, or more output
# '-e' means installing a project in editable mode,
# thus any local modifications made to the code will take effect without reinstallati

如果不从源码编译可以直接安装:

pip install "mmsegmentation>=1.0.0"

5. 测试是否安装成功

mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .
python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg

如果报numpy版本高,需要对numpy降版本

pip install numpy==1.26.4

出现AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4.
安装对应的提示版本即可:

mim install "mmcv==2.0.0rc4"

可以去下面这个网站,查找mmcv相应的版本:
mmcv官网

在这里插入图片描述

安装其他依赖

pip install ftfy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install regex -i https://pypi.tuna.tsinghua.edu.cn/simple

然后再运行这一句,就成功了:

python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg

运行的时候可能会出现如下错误:

OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll
already initialized.

搜索自己的安装位置,会有很多的 libiomp5md.dll, 删除一些就可以。
然后出现一下结果就可以了,这个报错是一个输入的问题,先不用管。
在这里插入图片描述

测试代码

from mmseg.apis import inference_model, init_model, show_result_pyplot
import mmcv

config_file = 'pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py'
checkpoint_file = 'pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

# build the model from a config file and a checkpoint file
model = init_model(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = 'demo/demo.png'  # or img = mmcv.imread(img), which will only load it once
result = inference_model(model, img)
# visualize the results in a new window
show_result_pyplot(model, img, result, show=True)
# or save the visualization results to image files
# you can change the opacity of the painted segmentation map in (0, 1].
show_result_pyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)

在这里插入图片描述

!!!至此,环境就配置完成了。

二、制作自己的数据

这里需要制作的数据格式为VOC的格式。
代码链接

1. 标注

我这里使用的是x-anylabeling软件,因为里面可以使用最新的SAM2.1,标注比较方便。
安装:

conda create --name x-anylabeling python=3.9 -y
conda activate x-anylabeling
pip install onnxruntime-gpu==1.14 -i https://pypi.tuna.tsinghua.edu.cn/simple
D:
mkdir LabelTool
cd LabelTool
conda install git 
git clone https://github.com/CVHub520/X-AnyLabeling.git
cd X-AnyLabeling
pip install -r requirements-gpu-dev.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
pyrcc5 -o anylabeling/resources/resources.py anylabeling/resources/resources.qrc
pip uninstall anylabeling -y
set PYTHONPATH=D:\LabelTool\X-AnyLabeling
python anylabeling/app.py
## 模型可以去网站上下载
把AI模型放到相应目录中 C:\Users\Administrator\xanylabeling_data\models\hyper_yolos-r20241216

使用
conda activate x-anylabeling
D:
cd D:\LabelTool\X-AnyLabeling
python anylabeling/app.py

标注完之后的文件的文件格式如下:
在这里插入图片描述

2. 数据格式转换 Json 转换成 PNG

在这里插入图片描述
运行代码:labelme2voc.py
然后进行数据集划分:split_train_test_sets.py
需要修改的地方:

  1. json_to_dataset.py 添加自己的类别
    在这里插入图片描述
  2. label.py 中对应的RGB颜色,与标签对应
    在这里插入图片描述
  3. labelme2voc.py 中的路径
    在这里插入图片描述
  4. 还需要创建一个labels.txt,存放类别,_background_是背景,如果没有可以不加
    在这里插入图片描述
  5. 运行后生成 labelme2voc.py格式如下:
    在这里插入图片描述
    在这里插入图片描述
  6. 数据划分

在这里插入图片描述

6

三、MMSegmentation 模型设置

以voc为例,其需要修改的配置文件如下:

mmsegmentation\mmseg\datasets\myvoc.py
mmsegmentation\mmseg\datasets\\__init__.py
mmsegmentation\configs\_base_\datasets\myvoc12.py   # 自己创建!!!
mmsegmentation\configs\_base_\models\deeplabv3plus_r50-d8.py
mmsegmentation\configs\deeplabv3plus\my_deeplabv3plus_r50-d8_4xb4-20k_voc12-512x512.py

1. mmsegmentation\mmseg\datasets\myvoc.py

# Copyright (c) OpenMMLab. All rights reserved.
import os.path as osp
import mmengine.fileio as fileio
from mmseg.registry import DATASETS
from .basesegdataset import BaseSegDataset
@DATASETS.register_module()
class MyVOCDataset(BaseSegDataset):  # 改一下名称
    """Pascal VOC dataset.
    Args:
        split (str): Split txt file for Pascal VOC.
    """
    METAINFO = dict(
        classes=('_background_', "A", "B", "C"),  # 改成自己的类别
        palette=[[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0]]) # 与制作voc中的对应
    def __init__(self,
                 ann_file,
                 img_suffix='.jpg',
                 seg_map_suffix='.png',
                 **kwargs) -> None:
        super().__init__(
            img_suffix=img_suffix,
            seg_map_suffix=seg_map_suffix,
            ann_file=ann_file,
            **kwargs)
        assert fileio.exists(self.data_prefix['img_path'],
                             self.backend_args) and osp.isfile(self.ann_file)

2. mmsegmentation\mmseg\datasets\init.py

# Copyright (c) OpenMMLab. All rights reserved.
# yapf: disable
from .ade import ADE20KDataset
from .basesegdataset import BaseCDDataset, BaseSegDataset
from .bdd100k import BDD100KDataset
from .chase_db1 import ChaseDB1Dataset
from .cityscapes import CityscapesDataset
from .coco_stuff import COCOStuffDataset
from .dark_zurich import DarkZurichDataset
from .dataset_wrappers import MultiImageMixDataset
from .decathlon import DecathlonDataset
from .drive import DRIVEDataset
from .dsdl import DSDLSegDataset
from .hrf import HRFDataset
from .hsi_drive import HSIDrive20Dataset
from .isaid import iSAIDDataset
from .isprs import ISPRSDataset
from .levir import LEVIRCDDataset
from .lip import LIPDataset
from .loveda import LoveDADataset
from .mapillary import MapillaryDataset_v1, MapillaryDataset_v2
from .night_driving import NightDrivingDataset
from .nyu import NYUDataset
from .pascal_context import PascalContextDataset, PascalContextDataset59
from .potsdam import PotsdamDataset
from .refuge import REFUGEDataset
from .stare import STAREDataset
from .synapse import SynapseDataset
from .myvoc import MyVOCDataset   # 添加!!!
# yapf: disable
from .transforms import (CLAHE, AdjustGamma, Albu, BioMedical3DPad,
                         BioMedical3DRandomCrop, BioMedical3DRandomFlip,
                         BioMedicalGaussianBlur, BioMedicalGaussianNoise,
                         BioMedicalRandomGamma, ConcatCDInput, GenerateEdge,
                         LoadAnnotations, LoadBiomedicalAnnotation,
                         LoadBiomedicalData, LoadBiomedicalImageFromFile,
                         LoadImageFromNDArray, LoadMultipleRSImageFromFile,
                         LoadSingleRSImageFromFile, PackSegInputs,
                         PhotoMetricDistortion, RandomCrop, RandomCutOut,
                         RandomMosaic, RandomRotate, RandomRotFlip, Rerange,
                         ResizeShortestEdge, ResizeToMultiple, RGB2Gray,
                         SegRescale)


# yapf: enable
__all__ = [
    'BaseSegDataset', 'BioMedical3DRandomCrop', 'BioMedical3DRandomFlip',
    'CityscapesDataset', 'MyVOCDataset'-添加, 'ADE20KDataset',
    'PascalContextDataset', 'PascalContextDataset59', 'ChaseDB1Dataset',
    'DRIVEDataset', 'HRFDataset', 'STAREDataset', 'DarkZurichDataset',
    'NightDrivingDataset', 'COCOStuffDataset', 'LoveDADataset',
    'MultiImageMixDataset', 'iSAIDDataset', 'ISPRSDataset', 'PotsdamDataset',
    'LoadAnnotations', 'RandomCrop', 'SegRescale', 'PhotoMetricDistortion',
    'RandomRotate', 'AdjustGamma', 'CLAHE', 'Rerange', 'RGB2Gray',
    'RandomCutOut', 'RandomMosaic', 'PackSegInputs', 'ResizeToMultiple',
    'LoadImageFromNDArray', 'LoadBiomedicalImageFromFile',
    'LoadBiomedicalAnnotation', 'LoadBiomedicalData', 'GenerateEdge',
    'DecathlonDataset', 'LIPDataset', 'ResizeShortestEdge',
    'BioMedicalGaussianNoise', 'BioMedicalGaussianBlur',
    'BioMedicalRandomGamma', 'BioMedical3DPad', 'RandomRotFlip',
    'SynapseDataset', 'REFUGEDataset', 'MapillaryDataset_v1',
    'MapillaryDataset_v2', 'Albu', 'LEVIRCDDataset',
    'LoadMultipleRSImageFromFile', 'LoadSingleRSImageFromFile',
    'ConcatCDInput', 'BaseCDDataset', 'DSDLSegDataset', 'BDD100KDataset',
    'NYUDataset', 'HSIDrive20Dataset'
]

3. mmsegmentation\configs_base_\datasets\myvoc12.py

# dataset settings
dataset_type = 'MyVOCDataset'   # 改成第一步中自己修改的类别
data_root = r'path\..\to\Voc\data_dataset_voc'   # 自己数据集的路径
crop_size = (512, 512)   # 裁剪尺寸,这里需要注意保持一致,可以修改成其他尺寸,但要和其他尺寸保持一致
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(
        type='RandomResize',
        scale=(1024, 256),   # 2 * 512, 0.5 * 512
        ratio_range=(0.5, 2.0),  
        keep_ratio=True),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='PackSegInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=(1024, 256), keep_ratio=True),   # 保持一致
    # add loading annotation after ``Resize`` because ground truth
    # does not need to do resize data transform
    dict(type='LoadAnnotations'),
    dict(type='PackSegInputs')
]
img_ratios = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
tta_pipeline = [
    dict(type='LoadImageFromFile', backend_args=None),
    dict(
        type='TestTimeAug',
        transforms=[
            [
                dict(type='Resize', scale_factor=r, keep_ratio=True)
                for r in img_ratios
            ],
            [
                dict(type='RandomFlip', prob=0., direction='horizontal'),
                dict(type='RandomFlip', prob=1., direction='horizontal')
            ], [dict(type='LoadAnnotations')], [dict(type='PackSegInputs')]
        ])
]
train_dataloader = dict(
    batch_size=4,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='InfiniteSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(
            img_path='JPEGImages', seg_map_path='SegmentationClass'),
        ann_file='ImageSets/Segmentation/train.txt',
        pipeline=train_pipeline))
val_dataloader = dict(
    batch_size=1,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(
            img_path='JPEGImages', seg_map_path='SegmentationClass'),
        ann_file='ImageSets/Segmentation/val.txt',
        pipeline=test_pipeline))
test_dataloader = val_dataloader

val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator

4. mmsegmentation\configs_base_\models\deeplabv3plus_r50-d8.py

这里主要是模型的一些参数,根据实际需要修改!

# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)
data_preprocessor = dict(
    type='SegDataPreProcessor',
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    bgr_to_rgb=True,
    pad_val=0,
    seg_pad_val=255)
model = dict(
    type='EncoderDecoder',
    data_preprocessor=data_preprocessor,
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=norm_cfg,
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='DepthwiseSeparableASPPHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        dilations=(1, 12, 24, 36),
        c1_in_channels=256,
        c1_channels=48,
        dropout_ratio=0.1,
        num_classes=19,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=19,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))

5. mmsegmentation\configs\deeplabv3plus\my_deeplabv3plus_r50-d8_4xb4-20k_voc12-512x512.py

# 修改自己的对应文件
_base_ = [
    '../_base_/models/deeplabv3plus_r50-d8.py',
    '../_base_/datasets/my_voc12.py', '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_20k.py'    # 训练时的参数设定
]
crop_size = (512, 512)  # 保持一致
data_preprocessor = dict(size=crop_size)
model = dict(
    data_preprocessor=data_preprocessor,
    decode_head=dict(num_classes=4),
    auxiliary_head=dict(num_classes=4))  # 改成自己的类别个数

四、MMSegmentation 训练

tools\train.y

# Copyright (c) OpenMMLab. All rights reserved.
import argparse
import logging
import os
import os.path as osp

from mmengine.config import Config, DictAction
from mmengine.logging import print_log
from mmengine.runner import Runner

from mmseg.registry import RUNNERS


def parse_args():
    parser = argparse.ArgumentParser(description='Train a segmentor')
    # 修改
    parser.add_argument('--config',
                        default='../configs/deeplabv3plus//my_deeplabv3plus_r50-d8_4xb4-20k_voc12-512x512.py',  
                        help='train config file path')
    # 修改
    parser.add_argument('--work-dir', default='../WorkDirs',help='the dir to save logs and models')
    parser.add_argument(
        '--resume',
        action='store_true',
        default=False,
        help='resume from the latest checkpoint in the work_dir automatically')
    parser.add_argument(
        '--amp',
        action='store_true',
        default=False,
        help='enable automatic-mixed-precision training')
    parser.add_argument(
        '--cfg-options',
        nargs='+',
        action=DictAction,
        help='override some settings in the used config, the key-value pair '
        'in xxx=yyy format will be merged into config file. If the value to '
        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
        'Note that the quotation marks are necessary and that no white space '
        'is allowed.')
    parser.add_argument(
        '--launcher',
        choices=['none', 'pytorch', 'slurm', 'mpi'],
        default='none',
        help='job launcher')
    # When using PyTorch version >= 2.0.0, the `torch.distributed.launch`
    # will pass the `--local-rank` parameter to `tools/train.py` instead
    # of `--local_rank`.
    parser.add_argument('--local_rank', '--local-rank', type=int, default=0)
    args = parser.parse_args()
    if 'LOCAL_RANK' not in os.environ:
        os.environ['LOCAL_RANK'] = str(args.local_rank)

    return args


def main():
    args = parse_args()

    # load config
    cfg = Config.fromfile(args.config)
    cfg.launcher = args.launcher
    if args.cfg_options is not None:
        cfg.merge_from_dict(args.cfg_options)

    # work_dir is determined in this priority: CLI > segment in file > filename
    if args.work_dir is not None:
        # update configs according to CLI args if args.work_dir is not None
        cfg.work_dir = args.work_dir
    elif cfg.get('work_dir', None) is None:
        # use config filename as default work_dir if cfg.work_dir is None
        cfg.work_dir = osp.join('./work_dirs',
                                osp.splitext(osp.basename(args.config))[0])

    # enable automatic-mixed-precision training
    if args.amp is True:
        optim_wrapper = cfg.optim_wrapper.type
        if optim_wrapper == 'AmpOptimWrapper':
            print_log(
                'AMP training is already enabled in your config.',
                logger='current',
                level=logging.WARNING)
        else:
            assert optim_wrapper == 'OptimWrapper', (
                '`--amp` is only supported when the optimizer wrapper type is '
                f'`OptimWrapper` but got {optim_wrapper}.')
            cfg.optim_wrapper.type = 'AmpOptimWrapper'
            cfg.optim_wrapper.loss_scale = 'dynamic'

    # resume training
    cfg.resume = args.resume

    # build the runner from config
    if 'runner_type' not in cfg:
        # build the default runner
        runner = Runner.from_cfg(cfg)
    else:
        # build customized runner from the registry
        # if 'runner_type' is set in the cfg
        runner = RUNNERS.build(cfg)

    # start training
    runner.train()


if __name__ == '__main__':
    main()

在这里插入图片描述
运行成功

五、MMSegmentation 测试

测试代码如下

from mmseg.apis import init_model, inference_model, show_result_pyplot

config_path = 'configs/deeplabv3plus/my_deeplabv3plus_r50-d8_4xb4-20k_voc12-512x512.py'
checkpoint_path = 'WorkDirs/iter_2000.pth'
img_path = 'demo/5.tiff'

model = init_model(config_path, checkpoint_path)
result = inference_model(model, img_path)
# 展示分割结果
vis_image = show_result_pyplot(model, img_path, result)
# 保存可视化结果,输出图像将在 `workdirs/result.png` 路径下找到
vis_iamge = show_result_pyplot(model, img_path, result, out_file='work_dirs/result.png')
# # 修改展示图像的时间,注意 0 是表示“无限”的特殊值
# vis_image = show_result_pyplot(model, img_path, result, wait_time=5)

在这里插入图片描述

### 使用 mmseg 框架训练自定义数据集 #### 安装 MMSegmentation 和依赖项 为了使用 `mmseg` 训练自定义数据集,首先需要安装 `MMSegmentation` 及其依赖环境。可以通过以下命令完成安装: ```bash git clone https://github.com/open-e . ``` 这会克隆仓库并安装必要的包[^1]。 #### 准备数据集 对于自定义数据集,通常需要将其转换成支持的格式之一,如 Pascal VOC 或 COCO 格式。假设已经标注好图像文件,并保存为 `.png` 文件作为标签图,则可以按照如下结构组织目录: ``` my_dataset/ ├── img_dir │ ├── train │ └── val └── ann_dir ├── train └── val ``` 其中 `img_dir` 存放输入图片而 `ann_dir` 放置对应的分割掩码图像[^2]。 #### 配置模型参数 创建一个新的配置文件用于描述所选网络架构及其超参设置。可以从官方提供的模板开始修改,比如 ResNet50 为基础骨干网搭配 UPerHead 头部组件构成的标准语义分割器: ```python _base_ = [ '../_base_/models/upernet_r50.py', '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py' ] model = dict( pretrained='open-mmlab://resnet50_v1c', backbone=dict(depth=50), decode_head=dict(num_classes=21), # 修改类别数量以适应新任务需求 auxiliary_head=dict(num_classes=21)) data_root = 'path/to/my_dataset/' data = dict( samples_per_gpu=4, workers_per_gpu=4, train=dict( data_root=data_root, img_dir='img_dir/train', ann_dir='ann_dir/train'), val=dict( data_root=data_root, img_dir='img_dir/val', ann_dir='ann_dir/val')) ``` 此部分指定了路径以及一些基本的数据加载选项。 #### 开始训练过程 一切就绪之后,在终端执行下面指令启动训练流程: ```bash python tools/train.py configs/my_custom_config.py ``` 这里 `configs/my_custom_config.py` 是之前编辑好的配置脚本名称。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值