MMSegmentation使用心得（二）——分布式训练

'Duktig、^

已于 2022-11-22 11:35:20 修改

阅读量1.1k

点赞数 3

分类专栏： MMSegmentation使用指引文章标签：分布式服务器 python 深度学习 linux

于 2022-11-22 11:21:00 首次发布

本文链接：https://blog.csdn.net/weixin_50646615/article/details/127978989

版权

在资源允许的情况下，很多小伙伴想要使用MMSegmentation进行分布式训练，下面我们就来讲解一下如何进行分布式训练。

**MMSegmentation不支持使用DataParallel进行分布式训练，只能使用命令行调用自带的文件进行。**同时在用MMDataParallel的时候很多小伙伴可能报错，大家可以参考一下我的方法。

如果要在 Linux 服务器上使用分布式训练，可以执行以下操作：首先在 Linux 下提供dist_train.sh

chmod 777 ./mmsegmentation/tools/dist_train.sh
vi ./mmsegmentation/tools/dist_train.sh
set ff=unix

接下来，可以使用以下命令进行分布式训练。博主用的方式是自己定义配置文件，例如 swin，可以根据需要自行配置配置文件

nohup ./mmsegmentation/tools/dist_train.sh ./mine/myconfig_swin.py 4 > hehe.log 2>&1 &

（使用nohup是为了防止在断网的时候训练中断，也不需要一直挂在本地上，很方便，使用远程服务器时都可以使用）

下面是博主自定义的config文件，供大家参考，大家在进行单卡训练时也可以参考下面的配置文件对swin进行尝试哦

norm_cfg = dict(type='SyncBN', requires_grad=True)
backbone_norm_cfg = dict(type='LN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    # pretrained='pretrain/swin_base_patch4_window12_384_22k.pth',
    backbone=dict(
        type='SwinTransformer',
        pretrain_img_size=384,
        embed_dims=128,
        patch_size=4,
        window_size=12,
        mlp_ratio=4,
        depths=[2, 2, 18, 2],
        num_heads=[4, 8, 16, 32],
        strides=(4, 2, 2, 2),
        out_indices=(0, 1, 2, 3),
        qkv_bias=True,
        qk_scale=None,
        patch_norm=True,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        use_abs_pos_embed=False,
        act_cfg=dict(type='GELU'),
        norm_cfg=dict(type='LN', requires_grad=True)),
    decode_head=dict(
        type='UPerHead',
        in_channels=[128, 256, 512, 1024],
        in_index=[0, 1, 2, 3],
        pool_scales=(1, 2, 3, 6),
        channels=512,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)