引言
作为2023年最热门的计算机视觉(CV)项目之一,DeepNude引起了非常多技术发烧友和普通大众更大关注。这个项目使用MASK技术实现了人体图像特定遮挡物的去除与去除后的图像生成,采用了对抗神经网络(GAN)等多种值得学习的计算机视觉(CV)技术,值得学习。因此俺写了这篇DeepNude在普通人的Windows电脑上进行复现的教程。
第一步:获得项目代码和模型
项目的代码和模型俺已经都准备好了,需要的小伙伴可以加入最简化机器学习获得。
DeepNude DeepNudeDeepNude 虽然应用程序已经下架,但其原始算法仍然在 GitHub 上公开,其中的算法值得研究。DeepNude DeepNudeDeepNude 的核心技术基于 Conditional GAN(CGAN)和 pix2pixHD,我们首先查阅整理这两个技术的关键:
- Conditional GAN(CGAN):GAN 的训练目标是生成逼真图片,但无法控制生成的内容,所以在实用性上有很大的限制。为了控制 GAN 的生成内容,Mizra 提出了 Conditional GAN (CGAN) 来解决这个问题。CGAN 的改造其实很简单易懂:把控制变量(label)与 latent variable 合并。这样,CGAN 的输入就有了人为可理解的意义,因为 label 都是人为定义的——例如在人脸生成中,label 可以包含年龄、性别、表情等等控制变量。CGAN 的设计,让人类可以更直观的控制 GAN 的生成内容。
- pix2pixHD:pix2pixHD 是由 NVIDIA 提出的高清图片生成算法,它解决了生成高清图片的问题。pix2pixHD 的网络结构分为两部分:G1 和 G2。G1 是全局的生成网络,可以在一半的图片大小上完成图片的转换。G2 是局部增强网络,可以把 G1 的输出放大回原来的图片大小并确保细节。这种设计避免了计算资源过大的问题——大部分的运算是在较低解析度的 G1 完成,替高解析度 G2 分担了大量的运算消耗。
二者的关系是:DeepNude DeepNudeDeepNude 的算法使用了 CGAN 作为背后的核心概念。但是 CGAN 仍然有一些未解决的问题,例如生成高清图片。这个问题通过 pix2pixHD 得到了解决。
DeepNude DeepNudeDeepNude 的实际做法是将问题拆解成三个部分。第一步先生成大致的 Label Map (Mask),第二步生成精细的 LabelMap (MaskDet),第三步生成果体图 (Nude)。每一步都经过了 OpenCV 前处理与 GAN 生成两步骤。
DeepNude DeepNudeDeepNude 的工作流程:
- 输入:用户提供一张人像照片。
- OpenCV 前处理:使用 OpenCV 对输入的照片进行预处理,包括裁剪、缩放等操作,以便于后续的图像生成。
- 生成大致的 Label Map (Mask):使用 Conditional GAN (CGAN) 生成一个大致的 Label Map,这是一个粗略的人体轮廓图。
- 生成精细的 LabelMap (MaskDet):在大致的 Label Map 的基础上,使用 CGAN 生成一个更精细的 Label Map,这是一个更详细的人体轮廓图。
- 生成裸体图 (Nude):最后,使用 pix2pixHD 算法,根据精细的 Label Map 生成最终的裸体图。
- 输出:将生成的裸体图返回给用户。
据此我们重新进行测试性训练和预测生成,效果一言难尽.....我们就把全部代码、模型和测试日志放在了知识星球上面。
部分代码如下:
import tensorflow as tf
# loss weight
LAMBDA = 10
class InstanceNormalization(tf.keras.layers.Layer):
"""Instance Normalization Layer (https://arxiv.org/abs/1607.08022)."""
def __init__(self, epsilon=1e-5):
super(InstanceNormalization, self).__init__()
self.epsilon = epsilon
def build(self, input_shape):
self.scale = self.add_weight(
name='scale',
shape=input_shape[-1:],
initializer=tf.random_normal_initializer(0., 0.02),
trainable=True)
self.offset = self.add_weight(
name='offset',
shape=input_shape[-1:],
initializer='zeros',
trainable=True)
def call(self, x):
mean, variance = tf.nn.moments(x, axes=[1, 2], keepdims=True)
inv = tf.math.rsqrt(variance + self.epsilon)
normalized = (x - mean) * inv
return self.scale * normalized + self.offset
def downsample(filters, size, norm_type='batchnorm', apply_norm=True):
"""Downsamples an input.
Conv2D => Batchnorm => LeakyRelu
Args:
filters: number of filters
size: filter size
norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
apply_norm: If True, adds the batchnorm layer
Returns:
Downsample Sequential Model
"""
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
kernel_initializer=initializer, use_bias=False))
if apply_norm:
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
elif norm_type.lower() == 'instancenorm':
result.add(InstanceNormalization())
result.add(tf.keras.layers.LeakyReLU())
return result
def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
"""Upsamples an input.
Conv2DTranspose => Batchnorm => Dropout => Relu
Args:
filters: number of filters
size: filter size
norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
apply_dropout: If True, adds the dropout layer
Returns:
Upsample Sequential Model
"""
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
elif norm_type.lower() == 'instancenorm':
result.add(InstanceNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
def unet_generator(output_channels, norm_type='batchnorm'):
"""Modified u-net generator model (https://arxiv.org/abs/1611.07004).
Args:
output_channels: Output channels
norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.
Returns:
Generator model
"""
down_stack = [
downsample(64, 4, norm_type, apply_norm=False), # (bs, 128, 128, 64)
downsample(128, 4, norm_type), # (bs, 64, 64, 128)
downsample(256, 4, norm_type), # (bs, 32, 32, 256)
downsample(512, 4, norm_type), # (bs, 16, 16, 512)
downsample(512, 4, norm_type), # (bs, 8, 8, 512)
downsample(512, 4, norm_type), # (bs, 4, 4, 512)
downsample(512, 4, norm_type), # (bs, 2, 2, 512)
downsample(512, 4, norm_type), # (bs, 1, 1, 512)
]
up_stack = [
upsample(512, 4, norm_type, apply_dropout=True), # (bs, 2, 2, 1024)
upsample(512, 4, norm_type, apply_dropout=True), # (bs, 4, 4, 1024)
upsample(512, 4, norm_type, apply_dropout=True), # (bs, 8, 8, 1024)
upsample(512, 4, norm_type), # (bs, 16, 16, 1024)
upsample(256, 4, norm_type), # (bs, 32, 32, 512)
upsample(128, 4, norm_type), # (bs, 64, 64, 256)
upsample(64, 4, norm_type), # (bs, 128, 128, 128)
]
initializer = tf.random_normal_initializer(0., 0.02)
last = tf.keras.layers.Conv2DTranspose(
output_channels, 4, strides=2,
padding='same', kernel_initializer=initializer,
activation='tanh') # (bs, 256, 256, 3)
concat = tf.keras.layers.Concatenate()
inputs = tf.keras.layers.Input(shape=[None, None, 3])
x = inputs
# Downsampling through the model
skips = []
for down in down_stack:
x = down(x)
skips.append(x)
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
x = concat([x, skip])
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
def discriminator(norm_type='batchnorm', target=True):
"""PatchGan discriminator model (https://arxiv.org/abs/1611.07004).
Args:
norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.
target: Bool, indicating whether target image is an input or not.
Returns:
Discriminator model
"""
initializer = tf.random_normal_initializer(0., 0.02)
inp = tf.keras.layers.Input(shape=[None, None, 3], name='input_image')
x = inp
if target:
tar = tf.keras.layers.Input(shape=[None, None, 3], name='target_image')
x = tf.keras.layers.concatenate([inp, tar]) # (bs, 256, 256, channels*2)
down1 = downsample(64, 4, norm_type, False)(x) # (bs, 128, 128, 64)
down2 = downsample(128, 4, norm_type)(down1) # (bs, 64, 64, 128)
down3 = downsample(256, 4, norm_type)(down2) # (bs, 32, 32, 256)
zero_pad1 = tf.keras.layers.ZeroPadding2D()(down3) # (bs, 34, 34, 256)
conv = tf.keras.layers.Conv2D(
512, 4, strides=1, kernel_initializer=initializer,
use_bias=False)(zero_pad1) # (bs, 31, 31, 512)
if norm_type.lower() == 'batchnorm':
norm1 = tf.keras.layers.BatchNormalization()(conv)
elif norm_type.lower() == 'instancenorm':
norm1 = InstanceNormalization()(conv)
leaky_relu = tf.keras.layers.LeakyReLU()(norm1)
zero_pad2 = tf.keras.layers.ZeroPadding2D()(leaky_relu) # (bs, 33, 33, 512)
last = tf.keras.layers.Conv2D(
1, 4, strides=1,
kernel_initializer=initializer)(zero_pad2) # (bs, 30, 30, 1)
if target:
return tf.keras.Model(inputs=[inp, tar], outputs=last)
else:
return tf.keras.Model(inputs=inp, outputs=last)
loss_obj = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real, generated):
real_loss = loss_obj(tf.ones_like(real), real)
generated_loss = loss_obj(tf.zeros_like(generated), generated)
total_disc_loss = real_loss + generated_loss
return total_disc_loss * 0.5
def generator_loss(generated):
return loss_obj(tf.ones_like(generated), generated)
def calc_cycle_loss(real_image, cycled_image):
loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))
return LAMBDA * loss1
def identity_loss(real_image, same_image):
loss = tf.reduce_mean(tf.abs(real_image - same_image))
return LAMBDA * 0.5 * loss
if __name__ == "__main__":
BATCH_SIZE = 10
IMG_WIDTH = 256
IMG_HEIGHT = 256
INPUT_CHANNELS = 3
OUTPUT_CHANNELS = 3
generator_g = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
generator_f = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
discriminator_x = discriminator(norm_type='instancenorm', target=False)
discriminator_y = discriminator(norm_type='instancenorm', target=False)
sample_apple = tf.random.normal([BATCH_SIZE, IMG_HEIGHT, IMG_WIDTH, INPUT_CHANNELS])
sample_orange = tf.random.normal([BATCH_SIZE, IMG_HEIGHT, IMG_WIDTH, INPUT_CHANNELS])
print(f"Inputs sample_apple.shape {sample_apple.shape}")
print(f"Inputs sample_orange.shape {sample_orange.shape}")
print(f"Pass by -----------------generator_g----------------------")
print(f"Pass by -----------------generator_f----------------------")
to_orange = generator_g(sample_apple)
to_apple = generator_f(sample_orange)
print(f"Outputs to_orange.shape {to_orange.shape}")
print(f"Outputs to_apple.shape {to_apple.shape}")
print("*"*100)
print(f"Inputs sample_apple.shape {sample_apple.shape}")
print(f"Inputs sample_orange.shape {sample_orange.shape}")
print(f"Pass by -----------------discriminator_y----------------------")
print(f"Pass by -----------------discriminator_x----------------------")
disc_real_orange = discriminator_y(sample_orange)
disc_real_apple = discriminator_x(sample_apple)
print(f"Outputs disc_real_orange.shape {disc_real_orange.shape}")
print(f"Outputs disc_real_apple.shape {disc_real_apple.shape}")