抠图

最新推荐文章于 2024-04-26 10:03:04 发布

studyeboy

最新推荐文章于 2024-04-26 10:03:04 发布

阅读量1.8k

点赞数 6

分类专栏：深度学习文章标签：深度学习神经网络

本文链接：https://blog.csdn.net/studyeboy/article/details/105840091

版权

深度学习专栏收录该内容

73 篇文章 26 订阅

订阅专栏

算法

Fast Multi-Level Forground Estimation(2020)

Paper
pymatting/pymatting

文章要解决的问题
closed-form 算法中的前景估计方法

closed-form算法中的前景估计方法，虽然可以在预处理阶段，通过采用阈值不完全的Cholesky分解结合共轭梯度下降进行加速。但是解决由此产生的 $2n \times 2n$ 的线性系统，在当前的硬件上， $n = 0.4 M$ 像素，误差收敛到 $10^{-6}$ 下，每个颜色通道始终会耗费30秒。在交互式图像编辑下，满足不了用户需求。文章中的前景估计方法可以在通用硬件上在几秒钟内处理几百万像素的图像。
多级前景估计方法
对于closed-form前景估计方法，可以用较小区域的损失函数计算的局部解近似替代全局的损失函数解来改进。但是这种方法是行不通的，损失函数的局部解不会将前景和背景色传播到alpha值的区域中，即使经过很多次迭代也是如此。但是多级方法可以减轻此缺点，从而产生一种有效的方法来近似前景色和背景色。
所以，在closed-form前景估计的损失函数的基础上，进行了如下修改，针对固定颜色通道c，像素i为局部图像区域的中心点，颜色梯度表示为局部图像区域中心的相邻像素的总和。此外，通过添加正则化因子，可以在具有恒定alpha值的区域中很好的定义问题。否则，在alpha值分别为0和1的区域中，前景色和背景色将不受约束。另外，引入常数来控制alpha梯度的影响。

为了解决传播慢的问题，采用多层次的方法，从不存在慢空间传播问题的低分辨率前景图像开始，迭代最小化局部区域的损失函数。接下来，通过最小化局部区域的损失函数作为初始化迭代较小尺寸的前景图像，重复此过程，直到达到输入图像的原始大小。
具体实现
算法步骤

import numpy as np
from numba import njit


@njit("void(f4[:, :, :], f4[:, :, :])")
def _resize_nearest_multichannel(dst, src):
    """
    Internal method.

    Resize image src to dst using nearest neighbors filtering.
    Images must have multiple color channels, i.e. :code:`len(shape) == 3`.

    Parameters
    ----------
    dst: numpy.ndarray of type np.float32
        output image
    src: numpy.ndarray of type np.float32
        input image
    """
    h_src, w_src, depth = src.shape
    h_dst, w_dst, depth = dst.shape

    for y_dst in range(h_dst):
        for x_dst in range(w_dst):
            x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
            y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))

            for c in range(depth):
                dst[y_dst, x_dst, c] = src[y_src, x_src, c]


@njit("void(f4[:, :], f4[:, :])")
def _resize_nearest(dst, src):
    """
    Internal method.

    Resize image src to dst using nearest neighbors filtering.
    Images must be grayscale, i.e. :code:`len(shape) == 3`.

    Parameters
    ----------
    dst: numpy.ndarray of type np.float32
        output image
    src: numpy.ndarray of type np.float32
        input image
    """
    h_src, w_src = src.shape
    h_dst, w_dst = dst.shape

    for y_dst in range(h_dst):
        for x_dst in range(w_dst):
            x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
            y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))

            dst[y_dst, x_dst] = src[y_src, x_src]


def _estimate_fb_ml(
    input_image,
    input_alpha,
    regularization,
    n_small_iterations,
    n_big_iterations,
    small_size,
    gradient_weight,
):
    h0, w0, depth = input_image.shape

    dtype = np.float32

    w_prev = 1
    h_prev = 1

    F_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
    B_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)

    n_levels = int(np.ceil(np.log2(max(w0, h0))))

    for i_level in range(n_levels + 1):
        w = round(w0 ** (i_level / n_levels))
        h = round(h0 ** (i_level / n_levels))

        image = np.empty((h, w, depth), dtype=dtype)
        alpha = np.empty((h, w), dtype=dtype)

        _resize_nearest_multichannel(image, input_image)
        _resize_nearest(alpha, input_alpha)

        F = np.empty((h, w, depth), dtype=dtype)
        B = np.empty((h, w, depth), dtype=dtype)

        _resize_nearest_multichannel(F, F_prev)
        _resize_nearest_multichannel(B, B_prev)

        if w <= small_size and h <= small_size:
            n_iter = n_small_iterations
        else:
            n_iter = n_big_iterations

        b = np.zeros((2, depth), dtype=dtype)

        dx = [-1, 1, 0, 0]
        dy = [0, 0, -1, 1]

        for i_iter in range(n_iter):
            for y in range(h):
                for x in range(w):
                    a0 = alpha[y, x]
                    a1 = 1.0 - a0

                    a00 = a0 * a0
                    a01 = a0 * a1
                    # a10 = a01 can be omitted due to symmetry of matrix
                    a11 = a1 * a1

                    for c in range(depth):
                        b[0, c] = a0 * image[y, x, c]
                        b[1, c] = a1 * image[y, x, c]

                    for d in range(4):
                        x2 = max(0, min(w - 1, x + dx[d]))
                        y2 = max(0, min(h - 1, y + dy[d]))

                        gradient = abs(a0 - alpha[y2, x2])

                        da = regularization + gradient_weight * gradient

                        a00 += da
                        a11 += da

                        for c in range(depth):
                            b[0, c] += da * F[y2, x2, c]
                            b[1, c] += da * B[y2, x2, c]

                    determinant = a00 * a11 - a01 * a01

                    inv_det = 1.0 / determinant

                    b00 = inv_det * a11
                    b01 = inv_det * -a01
                    b11 = inv_det * a00

                    for c in range(depth):
                        F_c = b00 * b[0, c] + b01 * b[1, c]
                        B_c = b01 * b[0, c] + b11 * b[1, c]

                        F_c = max(0.0, min(1.0, F_c))
                        B_c = max(0.0, min(1.0, B_c))

                        F[y, x, c] = F_c
                        B[y, x, c] = B_c

        F_prev = F
        B_prev = B

        w_prev = w
        h_prev = h

    return F, B


exports = {
    "_resize_nearest_multichannel": (
        _resize_nearest_multichannel,
        "void(f4[:, :, :], f4[:, :, :])",
    ),
    "_resize_nearest": (_resize_nearest, "void(f4[:, :], f4[:, :])"),
    "_estimate_fb_ml": (
        _estimate_fb_ml,
        "Tuple((f4[:, :, :], f4[:, :, :]))(f4[:, :, :], f4[:, :], f4, i4, i4, i4, f4)",
    ),
}

评估方法
结果
缺点与不足
- 使用KNN alpha遮罩作为输入的情况下，由于alpha遮罩的几乎二进制性质，估计的前景颜色通常太暗，可以在所有经过测试的前景评估方法中观察到这一点，如图7所示。
- 使用IndexNet alpha遮罩作为输入的情况下，由于alpha遮罩中的伪影，绿色和蓝色背景颜色仍然会发光。由于前景色强烈传播到背景区域中，因此，多级前景方法的效果会大大降低，如图8最后一行所示。
- 使用信息流产生的alpha遮罩方法，稍微高估了金属丝网图像的哑光（图9第3列），导致闭合形式前景估计的绿色网格以及其他方法的深色斑点。否则，所有方法都会产生可接受的结果。
耗时
内存使用情况

Is a Green Screen Really Necessary for Real-Time Portrait Matting(2020)

Paper
ZHKKKe/MODNet
MODNet-Image Matting Demo.jpynb
在这里插入图片描述

人像抠图的框架

MODNet是无需trimap的抠图方法，MODNet的抠图效果没有基于trimap的抠图效果好，但是速度快无法处理训练集未涵盖的奇特服装和强烈的运动模糊。
- SOC（Sub-Objectives Consistency, SOC）：将MODNet推广到实际数据至关重要。
- OFD（One-Frame Delay）：对于视频可以消除边界上的闪烁。
MODNet架构
- Semantic Estimation（语义估计）：输出粗略的前景模板S。
- Detail Prediction（细节预测）：产生精细的前景边界D
- Semantic-Detail Fusion（语义细节融合）：结合语义估计和细节预测进行融合，即F。
MODNet-Portrait Image Matting Demo
- Preparation
  下载模型到./MODNet/pretrained/目录下。
- Upload Images
  上传待处理图像到./demo/image_matting/colab/input/目录下，并创建输出图像目录./demo/image_matting/colab/output/目录下。
- Inference
  运行./demo/image_matting/colab/inference.py脚本。
- Visuallization
  可视化结果
- Download Results
  下载结果

MODNet-Portrait Image Matting Demo Code

import os
import sys
import argparse
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms

from src.models.modnet import MODNet
from pymatting import *


def combined_display(image, matte):
    #calculate display resolution
    w, h = image.width, image.height
    rw, rh = 800, int(h * 800 / (3 * w))
    #obtain predicted foreground
    image = np.asarray(image)
    if len(image.shape) == 2:
        image = image[:, :, None]
    if image.shape[2] == 1:
        image = np.repeat(image, 3, axis=2)
    elif image.shape[2] == 4:
        image = image[:, :, 0:3]
    fg_im = estimate_foreground_ml(image / 255., matte / 255)
    matte = np.repeat(np.asarray(matte)[:, :, None], 3, axis=2) / 255
    foreground = fg_im * 255 * matte + np.full(image.shape, 255) * (1 - matte)
    #combine image, foreground, and alpha into one line
    combined = np.concatenate((image, foreground, matte * 255), axis=1)
    combined = Image.fromarray(np.uint8(combined)).resize((rw, rh))
    return combined


if __name__ == '__main__':
    # define cmd arguments
    parser = argparse.ArgumentParser()
    parser.add_argument('--input_path', type=str, help='path of input images', default='./input/')
    parser.add_argument('--output_path', type=str, help='path of output images', default='./output/')
    parser.add_argument('--ckpt_path', type=str, help='path of pre-trained MODNet', \
                        default='../../../pretrained/modnet_photographic_portrait_matting.ckpt')
    args = parser.parse_args()

    # check input arguments
    if not os.path.exists(args.input_path):
        print('Cannot find input path: {0}'.format(args.input_path))
        exit()
    if not os.path.exists(args.output_path):
        print('Cannot find output path: {0}'.format(args.output_path))
        exit()
    if not os.path.exists(args.ckpt_path):
        print('Cannot find ckpt path: {0}'.format(args.ckpt_path))
        exit()

    # define hyper-parameters
    ref_size = 512

    # define image to tensor transform
    im_transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ]
    )

    # create MODNet and load the pre-trained ckpt
    modnet = MODNet(backbone_pretrained=False)
    modnet = nn.DataParallel(modnet).cuda()
    modnet.load_state_dict(torch.load(args.ckpt_path))
    modnet.eval()

    # inference images
    im_names = os.listdir(args.input_path)
    for im_name in im_names:
        print('Process image: {0}'.format(im_name))

        # read image
        im = Image.open(os.path.join(args.input_path, im_name))
        im_src = im.copy()
        # unify image channels to 3
        im = np.asarray(im)
        if len(im.shape) == 2:
            im = im[:, :, None]
        if im.shape[2] == 1:
            im = np.repeat(im, 3, axis=2)
        elif im.shape[2] == 4:
            im = im[:, :, 0:3]

        # convert image to PyTorch tensor
        im = Image.fromarray(im)
        im = im_transform(im)

        # add mini-batch dim
        im = im[None, :, :, :]

        # resize image for input
        im_b, im_c, im_h, im_w = im.shape
        if max(im_h, im_w) < ref_size or min(im_h, im_w) > ref_size:
            if im_w >= im_h:
                im_rh = ref_size
                im_rw = int(im_w / im_h * ref_size)
            elif im_w < im_h:
                im_rw = ref_size
                im_rh = int(im_h / im_w * ref_size)
        else:
            im_rh = im_h
            im_rw = im_w
        
        im_rw = im_rw - im_rw % 32
        im_rh = im_rh - im_rh % 32
        im = F.interpolate(im, size=(im_rh, im_rw), mode='area')

        # inference
        _, _, matte = modnet(im.cuda(), True)

        # resize and save matte
        matte = F.interpolate(matte, size=(im_h, im_w), mode='area')
        matte = matte[0][0].data.cpu().numpy() #alpha matte
        matte_name = im_name.split('.')[0] + '.png'
        # Image.fromarray(((matte * 255).astype('uint8')), mode='L').save(os.path.join(args.output_path, matte_name))
        (combined_display(im_src, (matte * 255).astype('uint8'))).save(os.path.join(args.output_path, matte_name))

Real-Time High-Resolution Background Matting(2020)

Paper
BackgroundMattingV2
Project
BackgroundMattingV2 Image Matting Example
在这里插入图片描述

Background-Matting：The World is Your Green Screen（2020）

[Background-Matting]
[Paper]
[Project]
在这里插入图片描述

分类
- Traditional approaches
  - simpling-based techniques
  - propagation-based techniques
- Learning-based approaches
  - trimap based methods
    - Context aware matting（CAM）
    - Index Matting（IM）
    - …
  - automatic matting algorithm
    - Late Fusion Matting（LFM）
    - …
- Matting with known natural background
- Video Matting
历史
- 2019
  - Disentangled image matting
  - Context-aware image matting for simultaneous foreground and alpha estimation
  - Learning to index for deep image matting
  - A late fusion cnn for digital matting
- 2018
  - Semantic soft segmentation
  - Encoder-decoder with atrous separable convlution for semantic image segmentation
  - Semantic human matting
  - Alpha-gan: Generative adversarial networks for natural image matting
- 2017
  - Designing effective inter-pixel information flow for natural image matting
  - Deep image matting
  - Fast deep matting for portrait animation on mobile phone
- 2016
  - Natural image matting using deep convolutional neural networks
  - Deep automatic portrait matting
- 2013
  - KNN matting
- 2011
  - A global sampling method for alpha matting
  - Nonlocal matting
- 2010
  - Shared sampling for real-time alpha matting
  - Fast matting using large kernel matting laplacian matrics
- 2008
  - Spectral matting
- 2007
  - A closed-form solution to natural image matting
- 2004
  - A bayesian approach to digital matting
网络结构
At the core of our approach is a deep matting network G that extracts foreground color and alpha for a given input frame, augmented with background, soft segmentation, and (optionally nearby video frames), and a discriminator network D that guides the training to generate realistic results.

损失函数

$U^2$ -Net:Going Deeper with Nested U-Structure for Salient Object Detection

paper
NathanUA/U-2-Net

显著性目标检测的主流思路：

多层次深层特征集成（multi-level deep feature integration）
多层次深层特征集成方法主要集中在开发更好的多层次特征聚合策略上。
多尺度特征提取（multi-scale feature extraction）
多尺度特征提取旨在设计更新的模块，从主干网获取的特征中同时提取局部和全局信息。

上述显著性目标检测都是为了更好的利用现有的图像分类的backbones生成的特征映射。作者另辟蹊径，提出了一种新颖而简单的结构，它直接逐级提取多尺度特征，用于显著目标检测，而不是利用这些主干的特征来开发和添加更复杂的模块和策略。

作者首先介绍了提出的Residual U-blocks，然后介绍基于Residual U-blocks构建的嵌套U型网络结构。

运行速度
输入320x320x3的图像，在1080TiGPU上的运行速度为30FPS。
网络结构

与U-Net的网络结构做一个对比：

U^2-Net的每一个Block都是一个U-Net结构的模块，即Residual U-blocks。当然，也可以继续深化，每个Block里面的U-Net的子Block仍然可以是一个U-Net结构。

Residual U-blocks

上图为普通卷积block，Res-like block，Inception-like block，Dense-like block和Residual U-blocks的对比图，Residual U-blockss受了U-Net的启发。
Residual U-blocks有以下三部分组成：
- 一个输入卷积层，它将输入的feature map x(HxWxC_in)转换成中间feature map F_1(x)， F_1(x)通道数为C_out。这是一个用于局部特征提取的普通卷积层。
- 一个U-like的对称的encoder-decoder结构，高度为L，以中间feature map F_1(x)为输入，去学习提取和编码多尺度文本信息U(F_1(x))。U表示类U-Net结构。更大L会得到更深层的U-blocks（RSU），更多的池化操作，更大的感受野和更丰富的局部和全局特征。配置此参数允许从具有任意空间分辨率的输入特征图中提取多尺度特征。从逐渐降采样特征映射中提取多尺度特征，并通过渐进上采样、合并和卷积等方法将其编码到高分辨率特征图中。这一过程减少了大尺度直接上采样造成的细节损失。
- 一种残差连接，它通过求和来融合局部特征和多尺度特征：F_1(x)+U(F_1(x))。
  
  RSU与Res block的主要设计区别在于RSU用U-Net结构代替了普通的单流卷积，用一个权重层（weight layer）形成的局部特征来代替原始特征。这种设计的变更使网络能够从多个尺度直接从每个残差块提取特征。更值得注意的是，U结构的计算开销很小，因为大多数操作都是在下采样的特征映射上进行的。
损失函数
结果

Attention-Guided Hierarchical Structure Aggregation for Image Matting（2020）

[CVPR2020-HAttMatting]
[Attention-Guided Hierarchical Structure Aggregation for Image Matting]

A Late Fusion CNN for Digital Matting(2019)

[FusionMatting]

[A Late Fusion CNN for Digital Matting]

[《A Late Fusion CNN for Digital Matting》论文阅读]

[[质疑][CVPR2019][A Late Fusion… Matting]]

[澄清误解-对CVPR 2019 LFM论文质疑的回复]

[阿里巴巴-浙江大学前言技术联合研究中心]

LFM是端到端的神经网络，输入包含前景的图像，输出为前景的alpha遮罩。利用神经网络来预测三个图：前景概率图、背景概率图和混合权重图。根据混合权重图将前景概率图和背景概率图进行融合得到alpha遮罩。需要训练的网络有分割网络预训练、融合网络预训练以及端到端的联合训练，训练损失加在输出alpha遮罩上。

Natural Image Matting via Guided Contextual Attention(2020)

[GCA-Matting]
[Natural Image Matting via Guided Contextual Attention]

网络结构
- GCA
损失函数

Deep image matting（2017）

[pytorch-deep-image-matting]

[Deep Image Matting]

[Project]

[[论文阅读]Deep Image matting(以及实现细节讨论)]

数据集

[人像分割不靠谱汇总【1】]
Matting 是将前景和背景进行软分割的方法，目标是找出前景和背景以及它们之间的融合程度。

注：trimap一般都是由matte扩张生成

汇总

[人像分割不靠谱汇总【1】]

挑战

[Alpha Matting Evaluation Website]

Evalution
- SAD(sum of absolution difference)
- MSE(mean square error)
References

参考资料
一键智能抠图-原理实现

studyeboy

关注

6
点赞
踩
17

收藏

觉得还不错? 一键收藏
1
评论
抠图

算法Background-Matting：The World is Your Green Screen（2020）[Background-Matting][Paper][Project]分类Traditional approachessimpling-based techniquespropagation-based techniquesLearning-based ap...
复制链接

扫一扫