音视频开发之旅（76）- 图片抠图换背景-MODNet

本文链接：https://blog.csdn.net/u011570979/article/details/136741634

1.效果展示

2.MODNet原理学习

3.实现图像前后景分离并换背景

4. 遇到的问题和解决方案

5. 在线工具推荐

6.参考资料

一、效果展示

以上图片有SD生成,对应prompt如下

半身证件图像1girl,face,curly hair,red hair,white background,(Body facing the camera :1.3),face in the horizontal center of the picture,ID photo,body and face facing the camera,wearing a shirt,suit,identification photo,<lora:identification photo_v3.0:0.7>,

半身/全身图像1girl,face,curly hair,red hair,(Body facing the camera :1.3),face in the horizontal center of the picture,ID photo,body and face facing the camera,(full body:1.3),smile,in summer,park,landscape,full_shot,

机器猫masterpiece, high quality, a robot cat, sleep on the sofa <lora:J_sci-fi-000014:0.8> j_sci-fi

二、MODNet原理学习

2.1 MODNet架构

该网络用来预测显著性区域（可以简单理解为人像，但不局限于人像，比如动物前景也没问题），以便从图像中分离出前后景。

主要有以下几块组成

低分辨率分支（Semantic Estimation），通过对图片的卷积缩小图像尺寸，预测大致的语义信息，即图像中哪些区域是人物。为后续的边缘细节预测以及融合提供上下文。该分支还有一个e-ASPP辅助模块，它是一个空洞空间金字塔池化结构，用于捕捉不同尺度上的语义信息，有助于处理图像人物不同部分的尺度变化

2. 高分辨率分支（Deatil Prediction），专注于预测图像中的细节，特别是边缘区域，一边产生更精准的分割效果。它利用了低分辨率分支的输出，对该输出进行上采样（即增加分辨率）以及结合下采样后的原始图像来获取更清晰的边缘分割。该分支也有一个辅助模块Skp Link，它将网络早期层的特征传递到后面的层，因为早期层通常包含更多的原始信息，这样做更有助于恢复细节。

3. 融合分支（Semantic-Detail Fusion），结合了低分辨率分支和高分辨率分支，通过融合语义信息以及边缘细节信息来提高分割的准确度。

4. 输出：上面三个分支分别输出：语义sp(表示图像中人像区域)、细节dp（表示人像的精细边缘）以及融合alpha matter alpha_p（显示了任务和背景明确的分离）

5. 后处理：transition region md，表示分割过程中可能会出现的过度区域，例如头发的边缘。

2.2 网络结构代码实现

class MODNet(nn.Module):    """ MODNet架构    """    #模型初始化    def __init__(self, in_channels=3, hr_channels=32, backbone_arch='mobilenetv2', backbone_pretrained=True):        super(MODNet, self).__init__()
        self.in_channels = in_channels        self.hr_channels = hr_channels        self.backbone_arch = backbone_arch        self.backbone_pretrained = backbone_pretrained
        self.backbone = SUPPORTED_BACKBONES[self.backbone_arch](self.in_channels)
        #初始化 低分辨率分支        self.lr_branch = LRBranch(self.backbone)        #初始化 高分辨率分支        self.hr_branch = HRBranch(self.hr_channels, self.backbone.enc_channels)        #初始化 融合分支        self.f_branch = FusionBranch(self.hr_channels, self.backbone.enc_channels)
        for m in self.modules():            if isinstance(m, nn.Conv2d):                self._init_conv(m)            elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.InstanceNorm2d):                self._init_norm(m)
        #加载预训练模型        if self.backbone_pretrained:            self.backbone.load_pretrained_ckpt()                
    #前向传播    def forward(self, img, inference):        pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(img, inference)        pred_detail, hr2x = self.hr_branch(img, enc2x, enc4x, lr8x, inference)        pred_matte = self.f_branch(img, lr8x, hr2x)
        #需要的就是融合分支处理后的结果 即pred_matte        return pred_semantic, pred_detail, pred_matte

三、实现图像前后景分离并换背景

3.1 抠图

# define image to tensor transformself.im_transform = transforms.Compose(    [        transforms.ToTensor(),        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))    ])
def interface(self,img_input,output_path):    # 读取图片    image = Image.open(img_input)
    #使用转换管道预处理图片（转为tensor，并根据均值和标准差做归一化处理）    im = im_transform(image)
    #图形是三维 宽、高以及颜色通道，modnet模型需要输入的是四维（多个batch），这个模拟构造一个batch_size为1的输入    im = im[None, :, :, :]
    #把图片resize到模型需要的特定尺寸大小（32的倍数）    im_b, im_c, im_h, im_w = im.shape    im_rh = im_h    im_rw = im_w    im_rw = im_rw - im_rw % 32    im_rh = im_rh - im_rh % 32    im = F.interpolate(im, size=(im_rh, im_rw), mode='area')
    #进行推理，调用模型的前向传播方法，返回matte    _, _, matte = self.modnet(im.cuda() if torch.cuda.is_available() else im, True)
    #resize matter的尺寸匹配输入的大小    matte = F.interpolate(matte, size=(im_h, im_w), mode='area')
    #转为numpy数组    predict = matte    predict = predict.squeeze()    predict_np = predict.cpu().data.numpy()        imo = Image.fromarray(predict_np*255).convert('RGB')    matte = np.array(imo)
    image = np.array(image)
    #取matte的颜色通道的任意一维（此时rgb的值都一样，要么是255，要么是0）    alpha = matte[:, :, [0]]    #把原始图像和alpha通道进行concatenate，生成抠图后的带alpha透明通道的png图片    res = np.concatenate((image, alpha), -1)    Image.fromarray((res.astype('uint8')), mode='RGBA').save(output_path)

3.2 换背景

把抠图后的前景png图片和需要使用的目标背景图片进行合成即可

def mergebgandfg(bgfile,fgfile,outfile):    bg = Image.open(bgfile)    fg = Image.open(fgfile)        bg = bg.resize(fg.size,Image.ANTIALIAS)        combined = Image.alpha_composite(bg.convert('RGBA'), fg)    combined.save(outfile)

四、遇到的问题和解决方案

对于一些采用更高压缩算法的图片，Image.open后获取不到其颜色通道（shape只有w和h），这样在后续的concatnate时就会报错

针对这种情况，在image.open后再进行covert('RGB')即可。

五、在线工具推荐

推荐一款免费的在线抠图工具：

https://www.remove.bg

六、参考资料

1.github https://github.com/ZHKKKe/MODNet

2. 论文：https://arxiv.org/pdf/2011.11961.pdf

3.removebg：https://www.remove.bg

感谢你的阅读

接下来我们继续学习输出AIGC相关内容，欢迎关注公众号“音视频开发之旅”，一起学习成长。