手撕代码HAttMatting（3）：dataset

最新推荐文章于 2024-04-16 10:00:38 发布

宇智波瞎眼猫

最新推荐文章于 2024-04-16 10:00:38 发布

阅读量2.5k

点赞数

分类专栏：手撕代码文章标签：深度学习 pytorch 机器学习人工智能 python

本文链接：https://blog.csdn.net/weixin_38344753/article/details/123734360

版权

手撕代码专栏收录该内容

12 篇文章 15 订阅

订阅专栏

虽然没搞到原来的dataset长什么样子，但是在后续查看代码的时候就发现一件事：如果不把dataset数据集处理看明白的话，很多东西只能靠瞎猜，所以这里就再按照之前研究dim的法子试一试。

在github搜索关于HAttMatting的代码的时候会发现两个同一个作者发出来的，一个就是前面找到的大块的代码，另外一个就是为了单纯展示成果的。就看看关于这个数据集的截图来猜测一下他的数据集应该是什么样子。

https://github.com/yuhaoliu7456/CVPR2020-HAttMatting

作为训练来说，有原图和蒙版图是必须的，这个和dim的思路一致。但是HAttMatting并没有依赖trimap进行拟合，这也是近一段时间matting的趋势，本身trimap就是为了在蒙版值的拟合上面添加一些限制条件保证拟合的效率，而且dim对于trimap也不是本身自带的，都是在代码里面借由蒙版值的合成生成的。再看看给出来的输入和输出对比

输入的是一个前景和背景的合成图，输出蒙版，这就是抠图作业的标准输入输出模式。那么这个数据集是怎么做的再录in文里面有所提到：

翻译过来就是： 59600 个训练图像和 1000 个测试图像，总共 646 个不同的前景 alpha 蒙版。那么为了把它的东西搞的明白一点，就打开那一整套代码中的gen_dataset


# -*- coding: utf-8 -*-
# File   : gen_dataset.py
# Author : Yuhao Liu
# Email  : yuhaoLiu7456@gmail.com
# Date   : 07/04/2020
# 
# This file is part of HAttMatting.
# https://github.com/wukaoliu/CVPR2020-HAttMatting
# Distributed under MIT License.

"""
The following is an example of generating test images.
While for the composition of training dataset, you need do some modification.
1. for the Adobe composition-1K: modify the num_bgs from 20 to 100; 
   for our Distinctions-646: modify the num_bgs from 20 to 80;
2. load fg_train.txt and bg_train.txt to replace the test files.
3. modify the original_path and saving_path
   For example(for Adobe datasets):
        fg_path = './Train_ori/Fg_images/'            
        a_path = './Train_ori/Alpha_matte/'
        bg_path = './COCO/'
        
        out_img_path = './Train/Image/'
        out_gt_path  = './Train/GT/'
        out_fg_path  = './Train/FG/' 
        out_bg_path  = './Train/BG/'
        
4. The most important thing is to create the corresponding data set in advance.

"""


import os 
import math
import cv2
import torch
import numpy as np
from PIL import Image

# for configuring original fg, mattes, bg
fg_path = './Test_ori/Fg_images/'            
a_path = './Test_ori/Alpha_matte/'
bg_path = './VOCdevkit/VOC2012/JPEGImages/'

# out_* for saving various images
out_img_path = './Test/Image/'
out_gt_path  = './Test/GT/'
out_fg_path  = './Test/FG/'
out_bg_path  = './Test/BG/'
def composite(fg, bg, a, w, h):
    bg = bg[0:w, 0:h] 
    bg = torch.from_numpy(bg).transpose(0, 2).double()
    fg = torch.from_numpy(fg).transpose(0, 2).double()
    alpha = torch.from_numpy(a).transpose(0, 1).double() /255
    composite_img = alpha * fg + (1 - alpha) * bg
    composite_img = composite_img.int()
    composite_img = composite_img.transpose(0, 2).numpy()

    return composite_img


num_bgs = 20
fg_files = [line.rstrip('\n') for line in open('./fg_test.txt')]
bg_files = [line.rstrip('\n') for line in open('/bg_test.txt')]

bg_iter = iter(bg_files)
index = 0
for im_name in fg_files:
    im = cv2.imread(os.path.join(fg_path, im_name))
    a = cv2.imread(os.path.join(a_path , im_name), cv2.IMREAD_GRAYSCALE)

    bbox = im.shape
    w = bbox[0]
    h = bbox[1]
    
    bcount = 0 
    for i in range(num_bgs):

        bg_name = next(bg_iter)        
        bg = cv2.imread(os.path.join(bg_path , bg_name))
        bg_bbox = bg.shape
        bw = bg_bbox[0]
        bh = bg_bbox[1]
        wratio = w / bw
        hratio = h / bh
        ratio = wratio if wratio > hratio else hratio     
        if ratio > 1:     
            # cv2--->PIL--->cv2 for keep the same. Since the resize of PIL and the resize of cv2 is different
            bg = Image.fromarray(cv2.cvtColor(bg,cv2.COLOR_BGR2RGB))
            bg = bg.resize((math.ceil(bh*ratio),math.ceil(bw*ratio)), Image.BICUBIC)
            bg = cv2.cvtColor(np.asarray(bg),cv2.COLOR_RGB2BGR)            
        
        out = composite(im, bg, a, w, h)
        cv2.imwrite(out_img_path  +im_name.split('_')[0] + '_' + str(index) + '.png', out) # +'train_img_'
        gt = a
        cv2.imwrite(out_gt_path + im_name.split('_')[0] + '_' + str(index) + '.png', gt) #  +'train_gt_'
        
        bf_for_save = bg[0:w, 0:h]
        cv2.imwrite(out_bg_path  +im_name.split('_')[0] + '_' + str(index) + '.png', bf_for_save)
        cv2.imwrite(out_fg_path  +im_name.split('_')[0] + '_' + str(index) + '.png', im)
        
        print(out_gt_path +im_name.split('_')[0] + '_' + str(index) + '.png' + '-----%d' %index)
        index += 1

话不多说直接上实验，因为长得的确和dim的dataset处理太像了，甚至更为简单。

最后在这四个文件夹里面保存几样数据：Image：合成前景和背景的图，BG：背景图，FG：前景图，GT：合成后的蒙版图，由于在论文里也没提到关于随机裁剪和尺寸变更的问题，因此这里的合成就相当简单粗暴：如果前景更大就把背景恰好拉到正好包住前景的尺寸，如果前景更小就直接把背景缩成前景尺寸合成就行。不过据我观察很少出现前景比较小的情况，第一：为了使得抠图效果细致，前景蒙版一般都会精细到毛发，因此像素水平就会特别高尺寸必然很大，第二：背景是voc数据集的，这个数据集最新的版本也才到2012年，距离现在已经有十年之久，当时的硬件跟现在都不能相提并论，图片尺寸普遍不大。不过为了保证代码的严谨性，特意照了一张巨大无比的背景照片试了一下。

但是把这些搞定了也就只能知道这个数据集到底是怎么进行初加工，后续还是得看模型里面的处理代码。

class MattingDataset(Dataset):
    def __init__(self, data_root, set_type='train'):
        super().__init__()
        self.data_root = data_root
        self.set_type = set_type
        self.images_dir = 'clip_img'
        self.labels_dir = 'matting'
        self.images_root = osp.join(self.data_root, self.images_dir)
        self.labels_root = osp.join(self.data_root, self.labels_dir)
        self.transformer = partial(_transform, set_type=self.set_type)
        self.color_transformer = partial(_color_transform, set_type=self.set_type)
        self.load_annotations()
        split_index = -1024
        if self.set_type == 'train':
            self.images_path = self.images_path[:split_index]
            self.labels_path = self.labels_path[:split_index]
        elif self.set_type == 'val':
            self.images_path = self.images_path[split_index:]
            self.labels_path = self.labels_path[split_index:]

    def load_annotations(self):
        self.images_path = [os.path.join(r, f) for r, _, fs in os.walk(self.images_root) for f in fs if osp.splitext(f)[1] == '.jpg']
        self.images_path.sort()
        self.labels_path = [image_path.replace(self.images_dir, self.labels_dir).replace('jpg', 'png').replace('clip', 'matting') for image_path in self.images_path]

    def __getitem__(self, idx):
        image_path = self.images_path[idx]
        label_path = self.labels_path[idx]
        image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)
        label = cv2.imread(label_path, cv2.IMREAD_UNCHANGED)
        if image is None or label is None:
            return self.__getitem__(random.randint(0, self.__len__()-1))
        label = label[:,:,3:4]
        image = self.color_transformer(image)
        image_rgba = np.concatenate([image, label], axis=-1)
        image_rgba= self.transformer(image_rgba)
        return image_rgba[:3], image_rgba[3:4]
    
    def __len__(self):
        return len(self.images_path)

在实际使用的时候是这样子的：

data_root = '../datasets/matting_human_half/'
train_dataset = MattingDataset(data_root, set_type='train')

这里就不好猜他到底传进去的路径是什么了。这里就得直接看getitem这个方法返回来什么数据。

return image_rgba[:3], image_rgba[3:4]

咱都知道前面模型使用的时候dataset返回的是image和label，对应图片和蒙版。那么反推回来，image_rgba[:3]也就是前三个通道数据对应合成后的image，image_rgba[3:4]也就是第四通道对应蒙版值。在进行transformer之前，image和label经过concatenate拼接成一个保存在image_rgba里面，axis=1为按列扩充。由于imread在读取图片之后的维度显示是：（高，宽，通道），显示的效果如下：

因此label要拼接进去作为第四通道就必须要按列进行拼接才行。在这之后的transformer直接用的现成工具partial，就得看看这东西是做什么的。

在init初始化里面是这么定义的transformer，也包括了color_transformer，就得找一圈partial是有什么功能。

Python笔记——functiontools. partial改变方法默认参数_Dean0Winchester的博客-CSDN博客

换句话说，在使用不管是自身的还是color的transformer的时候，_transform以及_color_transform方法会将set_type变量默认为传入的set_type。在这里面set_type会传入两个值：train和valid，也就是对应的训练集和测试集，默认是训练集也就是train。由于partial就是为了使得里面的函数其中的一个参数成为默认值说白了就是固定住这个参数，那么就需要看一下两个transform的代码。

def _transform(image, set_type='train'):
    image = transformer[set_type](image=image)['image']
    return image

def _color_transform(image, set_type='train'):
    image = color_transformer[set_type](image=image)['image']
    return image

又一次进行了调用函数，还得继续追踪。

color_transformer = {
    'train': A.ColorJitter(brightness=0.35, contrast=0.5, saturation=0.5, hue=0.2, always_apply=False, p=0.7),
    'val': lambda image: dict(image=image)
}

transformer = {
    'train': A.Compose(
    [
        A.HorizontalFlip(p=0.5),  ## Becareful when using that, because the keypoint is flipped but the index is flipped too
        A.Affine(scale=(-0.25, 0.25), translate_percent=(-0.125, 0.125), rotate=(-40, 40), mode=4, always_apply=False, p=0.5),
        A.RandomSizedCrop(min_max_height=[320, 600], width=320, height=320, p=0.5),
        A.Resize(320, 320),
        A.Normalize(mean=mean, std=std),
        AP.ToTensorV2()
    ]),
    'val': A.Compose(
    [
        A.Resize(320, 320),
        A.Normalize(mean=mean, std=std),
        AP.ToTensorV2()
    ]),
}

这里面提到了一个之前没有遇到的工具：

import albumentations as A
import albumentations.pytorch as AP

这两个工具是做什么的，找到了一个专栏文章，就是一个针对opencv的增强工具。

albumentations 数据增强工具的使用 - 知乎

两个transformer分成了两个大类：train和valid，分别对应训练集和验证集的数据变换。这里面按照训练集和测试集分别来进行解读。

当传入的set_type为train时，
- color_transformer会调整亮度以及对比度以及色相，always_apply默认为False（具体有什么作用没查到）
- transformer会进行如下几步操作：
  - 图片翻转HorizontalFlip（这里注明了如果进行图片翻转的话关键点和索引都会反转），
  - Affine将如下参数打包到一起：缩放系数scale在（-0.25,0.25）之间选定；平移百分比translate_percent在区间（-0.125,0.125）之间均匀采样；旋转度数rotate在（-40,40）度数之间进行旋转；mode=4作为opencv边界标志；p为应用变换的概率，默认0.5
  - RandomSizeCrop进行随机裁剪，裁剪大小限制min_max_Height在[320,600]之间，宽度320，高度320，应用变换概率p=0.5
  - Resize尺寸重组为（320,320）
  - 标准化Normalization，以这两个标准进行标准化操作mean=(0.485, 0.456, 0.406, 0) std=(0.229, 0.224, 0.225, 1)
  - toTensor转化为tensor类型
当传入set_type为valid时
- color_transformer没有对image做任何改变，只是使用了dict对image变成字典型
- transformer只进行了重组尺寸（320,320），标准化以及转换为tensor

其实到这里也就是我们能力所及了，因为具体的数据集没有的情况下很多细节东西都未知，只能说看到他返回什么再往回推具体是什么样子的数据。接下来就需要看损失函数以及模型的走向。