BiSeNetV1 面部分割

最新推荐文章于 2023-12-04 10:11:33 发布

HySmiley

最新推荐文章于 2023-12-04 10:11:33 发布

阅读量1.3k

点赞数 1

分类专栏：深度学习文章标签：深度学习计算机视觉人工智能

本文链接：https://blog.csdn.net/m0_37264397/article/details/124794741

版权

深度学习专栏收录该内容

30 篇文章 1 订阅

订阅专栏

1、论文

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

https://arxiv.org/abs/1808.00897.pdf

论文中提到：采用降低空间分辨率，实现实时推理速度会导致性能差。为此提出由空间路径(Spatial Path)和上下文路径(Context Path)两部分组成的双边分割网络(BiSeNet)。

Spatial Path：保存空间信息，生成高分辨率特征。

Context Path：采用快速下采样策略获得足够的感受野。

作者总结实时语义分割，加速模型的三种方法：

①、尝试限制输入大小，通过裁剪或调整大小来降低计算复杂度。虽然该方法简单有效，但空间细节的丢失破坏了预测，特别是在边界附近，导致度量和可视化精度下降。

②、对网络的通道进行修剪，以提高推理速度，特别是在基础模型的早期阶段。然而，它削弱了空间容量。

③、ENet建议放弃模型的最后阶段，追求一个非常紧凑的框架。然而，这种方法的缺点是很明显的:由于ENet在最后阶段放弃了降采样操作，模型的接受域不足以覆盖大对象，导致识别能力较差。

详细参考：https://blog.csdn.net/sinat_17456165/article/details/106152907

2、网络结构

各个模块

①Spatial Path：由几组卷积+BN+relu组成每层卷积步长为2.

特点：网络浅、通道宽。作用：保留丰富的空间信息生成高分辨率特征。

class SpatialPath(nn.Module):
    def __init__(self):
        super(SpatialPath, self).__init__()
        self.cbnr1=ConvBNRelu(3,64,7,2,3)
        self.cbnr2 = ConvBNRelu(64, 64, 3, 2, 1)
        self.cbnr3 = ConvBNRelu(64, 64, 3, 2, 1)
        self.cbnr4 = ConvBNRelu(64, 128, 1, 1, 0)
        self.init_weight()

    def init_weight(self):
        for ly in self.children():
            if isinstance(ly, nn.Conv2d):
                nn.init.kaiming_normal_(ly.weight, a=1)
                if not ly.bias is None:
                    nn.init.constant_(ly.bias, 0)
    def forward(self,x):
        x=self.cbnr1(x)
        x=self.cbnr2(x)
        x=self.cbnr3(x)
        x=self.cbnr4(x)
        return x

    def get_params(self):
        wd_params, nowd_params = [], []
        for name, module in self.named_modules():
            if isinstance(module, (nn.Linear, nn.Conv2d)):
                wd_params.append(module.weight)
                if not module.bias is None:
                    nowd_params.append(module.bias)
                elif isinstance(module, nn.BatchNorm2d):
                    nowd_params += list(module.parameters())
        return wd_params, nowd_params

②Context Path ：由ARM+轻量型网络（Res18/Xception39等）

特点：网络深。作用：获取足够多的感受野。

以res18为例：

若不使用torchvision中model库，重新写res18网络并使用其预训练模型。

网络中的参数名可以不同，但是网络层数需要一致，主要是方便参数赋值。

初始化-预训练参数的加载。

 def init_weight(self):
        
        model=resnet18(pretrained=False)
        model.fc=None
  


        model.load_state_dict(torch.load('resnet18-5c106cde.pth'))
        #如果不使用临时变量，参数值不会更新
        self_state_dict=self.state_dict()
 
        dict=[]
        for k,v in  model.state_dict().items():
            dict.append(v)
        for i,(k,v) in  enumerate(self_state_dict.items()):
            self_state_dict.update({k:dict[i]})
        self.load_state_dict(self_state_dict)

ARM模块：

细化特征，特点：计算无损耗。

③FFM 特征融合模块

主要是融合两条路径的特征map

3、数据集

人脸分割数据集CelebAMask-HQ包含3w张人脸图像，以及人脸各部分分割的mask。

数据集有19个分割标签（包含背景）：'skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r', 'nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat'。

mask图像是24位png图，且各个分类标签是独立的，需要将其量化并融合到一张图中转换为8位png图。

#!/usr/bin/python
# -*- encoding: utf-8 -*-

import os.path as osp
import os
import cv2
from transform import *
from PIL import Image

face_data = '/data/CelebAMask-HQ/CelebA-HQ-img'
face_sep_mask = '/data/CelebAMask-HQ/CelebAMask-HQ-mask-anno'
mask_path = '/data/CelebAMask-HQ/mask'
counter = 0
total = 0
for i in range(15):

    atts = ['skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r',
            'nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat']

    for j in range(i * 2000, (i + 1) * 2000):

        mask = np.zeros((512, 512))

        for l, att in enumerate(atts, 1):
            total += 1
            file_name = ''.join([str(j).rjust(5, '0'), '_', att, '.png'])
            path = osp.join(face_sep_mask, str(i), file_name)

            if os.path.exists(path):
                counter += 1
                sep_mask = np.array(Image.open(path).convert('P'))
                # print(np.unique(sep_mask))

                mask[sep_mask == 225] = l
        cv2.imwrite('{}/{}.png'.format(mask_path, j), mask)
        print(j)

print(counter, total)

合并后的mask图像为：

数据集划分：train：test=9:1

4、加载数据集

①、数据增强

数据增强方法有随机裁剪、镜像、缩放、颜色空间增强等。

随机裁剪：原图像与mask同处理。

镜像：原图像与mask 镜像处理，mask中部分标签互换：眼睛、眉毛、耳朵。

缩放：原图像与mask同处理。

颜色空间：原图像进行饱和度、对比度、透明度调整。

②、加载

DataLoader与DataSet结合使用

transform转换

图像遍历

图像的批次

5、损失函数

Li:logsoftmax

lp：主损失

li：辅助损失（cp过程）

6、优化器

随机梯度下降法，超参数设置、更新。

7、日志

使用logger库记录训练过程中数据。

8、评估指标

混淆矩阵的形式:

T(F)/P(N)	预测为真	预测为假
实际为真	真阳性（TP）	假阴性（FN）
实际为假	假阳性（FP）	真阴性（TN）

计算构建：

def confusion_matrix(self,pre,lab):
        P_pre=pre.flatten()
        L_lab=lab.flatten()

        mask=(L_lab>=0)&(L_lab<self.num_class)
        confusion=np.zeros((self.num_class,self.num_class))#,dtype=np.int32
        #n*L+P

        confusion+=np.bincount(self.num_class*L_lab[mask].astype(int)+P_pre[mask],minlength=self.num_class**2).reshape(self.num_class,self.num_class)
        return confusion

由混淆矩阵计算模型的评估指标：

像素精度:

def pixel_acc(self,confusion):
        return np.diag(confusion).sum()/(confusion.sum())

各类别精度

 def class_acc(self,confusion):
        return np.diag(confusion)/np.maximum(confusion.sum(axis=1),1)#vector(1*numclass)

类别平均精度：

def mpa(self,cls_acc):
        return np.nanmean(cls_acc)

iou交并比

def iou(self,confusion):
        return np.diag(confusion) / np.maximum(np.sum(confusion,axis=1) + np.sum(confusion,axis=0) - np.diag(confusion), 1)

miou平均交并比

def miou(self,iou_):
        return np.nanmean(iou_)

9、结果分析

训练8w次
acc=94.95%, macc=57.41%, mIoU=52.40%

测试：

参考：

GitHub - zllrunning/face-makeup.PyTorch: Lip and hair color editor using face parsing maps.

语义分割各种评价指标实现_络小绎的博客-CSDN博客_语义分割评价指标

HySmiley

关注

1
点赞
踩
14

收藏

觉得还不错? 一键收藏
打赏
3
评论
BiSeNetV1 面部分割

1、论文2、数据集3、优化器4、损失函数5、日志6、评估指标7、结果分析
复制链接

扫一扫

专栏目录