算法
Fast Multi-Level Forground Estimation(2020)
- 文章要解决的问题
- closed-form 算法中的前景估计方法
closed-form算法中的前景估计方法,虽然可以在预处理阶段,通过采用阈值不完全的Cholesky分解结合共轭梯度下降进行加速。但是解决由此产生的 2 n × 2 n 2n \times 2n 2n×2n的线性系统,在当前的硬件上, n = 0.4 M n=0.4M n=0.4M像素,误差收敛到 1 0 − 6 10^{-6} 10−6下,每个颜色通道始终会耗费30秒。在交互式图像编辑下,满足不了用户需求。文章中的前景估计方法可以在通用硬件上在几秒钟内处理几百万像素的图像。 - 多级前景估计方法
对于closed-form前景估计方法,可以用较小区域的损失函数计算的局部解近似替代全局的损失函数解来改进。但是这种方法是行不通的,损失函数的局部解不会将前景和背景色传播到alpha值的区域中,即使经过很多次迭代也是如此。但是多级方法可以减轻此缺点,从而产生一种有效的方法来近似前景色和背景色。
所以,在closed-form前景估计的损失函数的基础上,进行了如下修改,针对固定颜色通道c,像素i为局部图像区域的中心点,颜色梯度表示为局部图像区域中心的相邻像素的总和。此外,通过添加正则化因子,可以在具有恒定alpha值的区域中很好的定义问题。否则,在alpha值分别为0和1的区域中,前景色和背景色将不受约束。另外,引入常数来控制alpha梯度的影响。
为了解决传播慢的问题,采用多层次的方法,从不存在慢空间传播问题的低分辨率前景图像开始,迭代最小化局部区域的损失函数。接下来,通过最小化局部区域的损失函数作为初始化迭代较小尺寸的前景图像,重复此过程,直到达到输入图像的原始大小。 - 具体实现
- 算法步骤
import numpy as np
from numba import njit
@njit("void(f4[:, :, :], f4[:, :, :])")
def _resize_nearest_multichannel(dst, src):
"""
Internal method.
Resize image src to dst using nearest neighbors filtering.
Images must have multiple color channels, i.e. :code:`len(shape) == 3`.
Parameters
----------
dst: numpy.ndarray of type np.float32
output image
src: numpy.ndarray of type np.float32
input image
"""
h_src, w_src, depth = src.shape
h_dst, w_dst, depth = dst.shape
for y_dst in range(h_dst):
for x_dst in range(w_dst):
x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))
for c in range(depth):
dst[y_dst, x_dst, c] = src[y_src, x_src, c]
@njit("void(f4[:, :], f4[:, :])")
def _resize_nearest(dst, src):
"""
Internal method.
Resize image src to dst using nearest neighbors filtering.
Images must be grayscale, i.e. :code:`len(shape) == 3`.
Parameters
----------
dst: numpy.ndarray of type np.float32
output image
src: numpy.ndarray of type np.float32
input image
"""
h_src, w_src = src.shape
h_dst, w_dst = dst.shape
for y_dst in range(h_dst):
for x_dst in range(w_dst):
x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))
dst[y_dst, x_dst] = src[y_src, x_src]
def _estimate_fb_ml(
input_image,
input_alpha,
regularization,
n_small_iterations,
n_big_iterations,
small_size,
gradient_weight,
):
h0, w0, depth = input_image.shape
dtype = np.float32
w_prev = 1
h_prev = 1
F_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
B_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
n_levels = int(np.ceil(np.log2(max(w0, h0))))
for i_level in range(n_levels + 1):
w = round(w0 ** (i_level / n_levels))
h = round(h0 ** (i_level / n_levels))
image = np.empty((h, w, depth), dtype=dtype)
alpha = np.empty((h, w), dtype=dtype)
_resize_nearest_multichannel(image, input_image)
_resize_nearest(alpha, input_alpha)
F = np.empty((h, w, depth), dtype=dtype)
B = np.empty((h, w, depth), dtype=dtype)
_resize_nearest_multichannel(F, F_prev)
_resize_nearest_multichannel(B, B_prev)
if w <= small_size and h <= small_size:
n_iter = n_small_iterations
else:
n_iter = n_big_iterations
b = np.zeros((2, depth), dtype=dtype)
dx = [-1, 1, 0, 0]
dy = [0, 0, -1, 1]
for i_iter in range(n_iter):
for y in range(h):
for x in range(w):
a0 = alpha[y, x]
a1 = 1.0 - a0
a00 = a0 * a0
a01 = a0 * a1
# a10 = a01 can be omitted due to symmetry of matrix
a11 = a1 * a1
for c in range(depth):
b[0, c] = a0 * image[y, x, c]
b[1, c] = a1 * image[y, x, c]
for d in range(4):
x2 = max(0, min(w - 1, x + dx[d]))
y2 = max(0, min(h - 1, y + dy[d]))
gradient = abs(a0 - alpha[y2, x2])
da = regularization + gradient_weight * gradient
a00 += da
a11 += da
for c in range(depth):
b[0, c] += da * F[y2, x2, c]
b[1, c] += da * B[y2, x2, c]
determinant = a00 * a11 - a01 * a01
inv_det = 1.0 / determinant
b00 = inv_det * a11
b01 = inv_det * -a01
b11 = inv_det * a00
for c in range(depth):
F_c = b00 * b[0, c] + b01 * b[1, c]
B_c = b01 * b[0, c] + b11 * b[1, c]
F_c = max(0.0, min(1.0, F_c))
B_c = max(0.0, min(1.0, B_c))
F[y, x, c] = F_c
B[y, x, c] = B_c
F_prev = F
B_prev = B
w_prev = w
h_prev = h
return F, B
exports = {
"_resize_nearest_multichannel": (
_resize_nearest_multichannel,
"void(f4[:, :, :], f4[:, :, :])",
),
"_resize_nearest": (_resize_nearest, "void(f4[:, :], f4[:, :])"),
"_estimate_fb_ml": (
_estimate_fb_ml,
"Tuple((f4[:, :, :], f4[:, :, :]))(f4[:, :, :], f4[:, :], f4, i4, i4, i4, f4)",
),
}
-
评估方法
-
结果
-
缺点与不足
- 使用KNN alpha遮罩作为输入的情况下,由于alpha遮罩的几乎二进制性质,估计的前景颜色通常太暗,可以在所有经过测试的前景评估方法中观察到这一点,如图7所示。
- 使用IndexNet alpha遮罩作为输入的情况下,由于alpha遮罩中的伪影,绿色和蓝色背景颜色仍然会发光。由于前景色强烈传播到背景区域中,因此,多级前景方法的效果会大大降低,如图8最后一行所示。
- 使用信息流产生的alpha遮罩方法,稍微高估了金属丝网图像的哑光(图9第3列),导致闭合形式前景估计的绿色网格以及其他方法的深色斑点。否则,所有方法都会产生可接受的结果。
-
耗时
-
内存使用情况
Is a Green Screen Really Necessary for Real-Time Portrait Matting(2020)
Paper
ZHKKKe/MODNet
MODNet-Image Matting Demo.jpynb
-
人像抠图的框架
MODNet是无需trimap的抠图方法,MODNet的抠图效果没有基于trimap的抠图效果好,但是速度快无法处理训练集未涵盖的奇特服装和强烈的运动模糊。- SOC(Sub-Objectives Consistency, SOC):将MODNet推广到实际数据至关重要。
- OFD(One-Frame Delay):对于视频可以消除边界上的闪烁。
-
MODNet架构
- Semantic Estimation(语义估计):输出粗略的前景模板S。
- Detail Prediction(细节预测):产生精细的前景边界D
- Semantic-Detail Fusion(语义细节融合):结合语义估计和细节预测进行融合,即F。
-
MODNet-Portrait Image Matting Demo
-
Preparation
下载模型到./MODNet/pretrained/目录下。
-
Upload Images
上传待处理图像到./demo/image_matting/colab/input/目录下,并创建输出图像目录./demo/image_matting/colab/output/目录下。
-
Inference
运行./demo/image_matting/colab/inference.py脚本。
-
Visuallization
可视化结果
-
Download Results
下载结果
-
-
MODNet-Portrait Image Matting Demo Code
import os import sys import argparse import numpy as np from PIL import Image import torch import torch.nn as nn import torch.nn.functional as F import torchvision.transforms as transforms from src.models.modnet import MODNet from pymatting import * def combined_display(image, matte): #calculate display resolution w, h = image.width, image.height rw, rh = 800, int(h * 800 / (3 * w)) #obtain predicted foreground image = np.asarray(image) if len(image.shape) == 2: image = image[:, :, None] if image.shape[2] == 1: image = np.repeat(image, 3, axis=2) elif image.shape[2] == 4: image = image[:, :, 0:3] fg_im = estimate_foreground_ml(image / 255., matte / 255) matte = np.repeat(np.asarray(matte)[:, :, None], 3, axis=2) / 255 foreground = fg_im * 255 * matte + np.full(image.shape, 255) * (1 - matte) #combine image, foreground, and alpha into one line combined = np.concatenate((image, foreground, matte * 255), axis=1) combined = Image.fromarray(np.uint8(combined)).resize((rw, rh)) return combined if __name__ == '__main__': # define cmd arguments parser = argparse.ArgumentParser() parser.add_argument('--input_path', type=str, help='path of input images', default='./input/') parser.add_argument('--output_path', type=str, help='path of output images', default='./output/') parser.add_argument('--ckpt_path', type=str, help='path of pre-trained MODNet', \ default='../../../pretrained/modnet_photographic_portrait_matting.ckpt') args = parser.parse_args() # check input arguments if not os.path.exists(args.input_path): print('Cannot find input path: {0}'.format(args.input_path)) exit() if not os.path.exists(args.output_path): print('Cannot find output path: {0}'.format(args.output_path)) exit() if not os.path.exists(args.ckpt_path): print('Cannot find ckpt path: {0}'.format(args.ckpt_path)) exit() # define hyper-parameters ref_size = 512 # define image to tensor transform im_transform = transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ] ) # create MODNet and load the pre-trained ckpt modnet = MODNet(backbone_pretrained=False) modnet = nn.DataParallel(modnet).cuda() modnet.load_state_dict(torch.load(args.ckpt_path)) modnet.eval() # inference images im_names = os.listdir(args.input_path) for im_name in im_names: print('Process image: {0}'.format(im_name)) # read image im = Image.open(os.path.join(args.input_path, im_name)) im_src = im.copy() # unify image channels to 3 im = np.asarray(im) if len(im.shape) == 2: im = im[:, :, None] if im.shape[2] == 1: im = np.repeat(im, 3, axis=2) elif im.shape[2] == 4: im = im[:, :, 0:3] # convert image to PyTorch tensor im = Image.fromarray(im) im = im_transform(im) # add mini-batch dim im = im[None, :, :, :] # resize image for input im_b, im_c, im_h, im_w = im.shape if max(im_h, im_w) < ref_size or min(im_h, im_w) > ref_size: if im_w >= im_h: im_rh = ref_size im_rw = int(im_w / im_h * ref_size) elif im_w < im_h: im_rw = ref_size im_rh = int(im_h / im_w * ref_size) else: im_rh = im_h im_rw = im_w im_rw = im_rw - im_rw % 32 im_rh = im_rh - im_rh % 32 im = F.interpolate(im, size=(im_rh, im_rw), mode='area') # inference _, _, matte = modnet(im.cuda(), True) # resize and save matte matte = F.interpolate(matte, size=(im_h, im_w), mode='area') matte = matte[0][0].data.cpu().numpy() #alpha matte matte_name = im_name.split('.')[0] + '.png' # Image.fromarray(((matte * 255).astype('uint8')), mode='L').save(os.path.join(args.output_path, matte_name)) (combined_display(im_src, (matte * 255).astype('uint8'))).save(os.path.join(args.output_path, matte_name))
Real-Time High-Resolution Background Matting(2020)
Paper
BackgroundMattingV2
Project
BackgroundMattingV2 Image Matting Example
Background-Matting:The World is Your Green Screen(2020)
[Background-Matting]
[Paper]
[Project]
-
分类
- Traditional approaches
- simpling-based techniques
- propagation-based techniques
- Learning-based approaches
- trimap based methods
- Context aware matting(CAM)
- Index Matting(IM)
- …
- automatic matting algorithm
- Late Fusion Matting(LFM)
- …
- trimap based methods
- Matting with known natural background
- Video Matting
- Traditional approaches
-
历史
- 2019
- Disentangled image matting
- Context-aware image matting for simultaneous foreground and alpha estimation
- Learning to index for deep image matting
- A late fusion cnn for digital matting
- 2018
- Semantic soft segmentation
- Encoder-decoder with atrous separable convlution for semantic image segmentation
- Semantic human matting
- Alpha-gan: Generative adversarial networks for natural image matting
- 2017
- Designing effective inter-pixel information flow for natural image matting
- Deep image matting
- Fast deep matting for portrait animation on mobile phone
- 2016
- Natural image matting using deep convolutional neural networks
- Deep automatic portrait matting
- 2013
- KNN matting
- 2011
- A global sampling method for alpha matting
- Nonlocal matting
- 2010
- Shared sampling for real-time alpha matting
- Fast matting using large kernel matting laplacian matrics
- 2008
- Spectral matting
- 2007
- A closed-form solution to natural image matting
- 2004
- A bayesian approach to digital matting
- 2019
-
网络结构
At the core of our approach is a deep matting network G that extracts foreground color and alpha for a given input frame, augmented with background, soft segmentation, and (optionally nearby video frames), and a discriminator network D that guides the training to generate realistic results.
- 损失函数
U 2 U^2 U2-Net:Going Deeper with Nested U-Structure for Salient Object Detection
显著性目标检测的主流思路:
- 多层次深层特征集成(multi-level deep feature integration)
多层次深层特征集成方法主要集中在开发更好的多层次特征聚合策略上。 - 多尺度特征提取(multi-scale feature extraction)
多尺度特征提取旨在设计更新的模块,从主干网获取的特征中同时提取局部和全局信息。
上述显著性目标检测都是为了更好的利用现有的图像分类的backbones生成的特征映射。作者另辟蹊径,提出了一种新颖而简单的结构,它直接逐级提取多尺度特征,用于显著目标检测,而不是利用这些主干的特征来开发和添加更复杂的模块和策略。
作者首先介绍了提出的Residual U-blocks,然后介绍基于Residual U-blocks构建的嵌套U型网络结构。
- 运行速度
输入320x320x3的图像,在1080TiGPU上的运行速度为30FPS。 - 网络结构
与U-Net的网络结构做一个对比:
U^2-Net的每一个Block都是一个U-Net结构的模块,即Residual U-blocks。当然,也可以继续深化,每个Block里面的U-Net的子Block仍然可以是一个U-Net结构。
- Residual U-blocks
上图为普通卷积block,Res-like block,Inception-like block,Dense-like block和Residual U-blocks的对比图,Residual U-blockss受了U-Net的启发。
Residual U-blocks有以下三部分组成:- 一个输入卷积层,它将输入的feature map x(HxWxC_in)转换成中间feature map F_1(x), F_1(x)通道数为C_out。这是一个用于局部特征提取的普通卷积层。
- 一个U-like的对称的encoder-decoder结构,高度为L,以中间feature map F_1(x)为输入,去学习提取和编码多尺度文本信息U(F_1(x))。U表示类U-Net结构。更大L会得到更深层的U-blocks(RSU),更多的池化操作,更大的感受野和更丰富的局部和全局特征。配置此参数允许从具有任意空间分辨率的输入特征图中提取多尺度特征。从逐渐降采样特征映射中提取多尺度特征,并通过渐进上采样、合并和卷积等方法将其编码到高分辨率特征图中。这一过程减少了大尺度直接上采样造成的细节损失。
- 一种残差连接,它通过求和来融合局部特征和多尺度特征:F_1(x)+U(F_1(x))。
RSU与Res block的主要设计区别在于RSU用U-Net结构代替了普通的单流卷积,用一个权重层(weight layer)形成的局部特征来代替原始特征。这种设计的变更使网络能够从多个尺度直接从每个残差块提取特征。更值得注意的是,U结构的计算开销很小,因为大多数操作都是在下采样的特征映射上进行的。
- 损失函数
- 结果
Attention-Guided Hierarchical Structure Aggregation for Image Matting(2020)
[CVPR2020-HAttMatting]
[Attention-Guided Hierarchical Structure Aggregation for Image Matting]
A Late Fusion CNN for Digital Matting(2019)
[A Late Fusion CNN for Digital Matting]
[《A Late Fusion CNN for Digital Matting》论文阅读]
[[质疑][CVPR2019][A Late Fusion… Matting]]
LFM是端到端的神经网络,输入包含前景的图像,输出为前景的alpha遮罩。利用神经网络来预测三个图:前景概率图、背景概率图和混合权重图。根据混合权重图将前景概率图和背景概率图进行融合得到alpha遮罩。需要训练的网络有分割网络预训练、融合网络预训练以及端到端的联合训练,训练损失加在输出alpha遮罩上。
Natural Image Matting via Guided Contextual Attention(2020)
[GCA-Matting]
[Natural Image Matting via Guided Contextual Attention]
- 网络结构
- GCA
- GCA
- 损失函数
Deep image matting(2017)
[[论文阅读]Deep Image matting(以及实现细节讨论)]
数据集
[人像分割不靠谱汇总【1】]
Matting 是将前景和背景进行软分割的方法,目标是找出前景和背景以及它们之间的融合程度。
注:trimap一般都是由matte扩张生成
汇总
挑战
[Alpha Matting Evaluation Website]
- Evalution
- SAD(sum of absolution difference)
- MSE(mean square error)
- References
参考资料
一键智能抠图-原理实现