基于Windows11系统和NVIDIA显卡的DeepNude无水印版本零基础复现教程

最新推荐文章于 2024-06-07 09:58:17 发布

wormhacker

最新推荐文章于 2024-06-07 09:58:17 发布

阅读量6.5k

点赞数 1

分类专栏：数据驱动建模文章标签：算法人工智能计算机视觉图像处理神经网络

本文链接：https://blog.csdn.net/wormhacker/article/details/131494636

版权

数据驱动建模专栏收录该内容

16 篇文章 1 订阅

订阅专栏

引言

作为2023年最热门的计算机视觉（CV）项目之一，DeepNude引起了非常多技术发烧友和普通大众更大关注。这个项目使用MASK技术实现了人体图像特定遮挡物的去除与去除后的图像生成，采用了对抗神经网络（GAN）等多种值得学习的计算机视觉（CV）技术，值得学习。因此俺写了这篇DeepNude在普通人的Windows电脑上进行复现的教程。

第一步：获得项目代码和模型

项目的代码和模型俺已经都准备好了，需要的小伙伴可以加入最简化机器学习获得。

DeepNude DeepNudeDeepNude 虽然应用程序已经下架，但其原始算法仍然在 GitHub 上公开，其中的算法值得研究。DeepNude DeepNudeDeepNude 的核心技术基于 Conditional GAN（CGAN）和 pix2pixHD，我们首先查阅整理这两个技术的关键：

Conditional GAN（CGAN）：GAN 的训练目标是生成逼真图片，但无法控制生成的内容，所以在实用性上有很大的限制。为了控制 GAN 的生成内容，Mizra 提出了 Conditional GAN (CGAN) 来解决这个问题。CGAN 的改造其实很简单易懂：把控制变量（label）与 latent variable 合并。这样，CGAN 的输入就有了人为可理解的意义，因为 label 都是人为定义的——例如在人脸生成中，label 可以包含年龄、性别、表情等等控制变量。CGAN 的设计，让人类可以更直观的控制 GAN 的生成内容。
pix2pixHD：pix2pixHD 是由 NVIDIA 提出的高清图片生成算法，它解决了生成高清图片的问题。pix2pixHD 的网络结构分为两部分：G1 和 G2。G1 是全局的生成网络，可以在一半的图片大小上完成图片的转换。G2 是局部增强网络，可以把 G1 的输出放大回原来的图片大小并确保细节。这种设计避免了计算资源过大的问题——大部分的运算是在较低解析度的 G1 完成，替高解析度 G2 分担了大量的运算消耗。

二者的关系是：DeepNude DeepNudeDeepNude 的算法使用了 CGAN 作为背后的核心概念。但是 CGAN 仍然有一些未解决的问题，例如生成高清图片。这个问题通过 pix2pixHD 得到了解决。

DeepNude DeepNudeDeepNude 的实际做法是将问题拆解成三个部分。第一步先生成大致的 Label Map (Mask)，第二步生成精细的 LabelMap (MaskDet)，第三步生成果体图 (Nude)。每一步都经过了 OpenCV 前处理与 GAN 生成两步骤。

DeepNude DeepNudeDeepNude 的工作流程：

输入：用户提供一张人像照片。
OpenCV 前处理：使用 OpenCV 对输入的照片进行预处理，包括裁剪、缩放等操作，以便于后续的图像生成。
生成大致的 Label Map (Mask)：使用 Conditional GAN (CGAN) 生成一个大致的 Label Map，这是一个粗略的人体轮廓图。
生成精细的 LabelMap (MaskDet)：在大致的 Label Map 的基础上，使用 CGAN 生成一个更精细的 Label Map，这是一个更详细的人体轮廓图。
生成裸体图 (Nude)：最后，使用 pix2pixHD 算法，根据精细的 Label Map 生成最终的裸体图。
输出：将生成的裸体图返回给用户。

据此我们重新进行测试性训练和预测生成，效果一言难尽.....我们就把全部代码、模型和测试日志放在了知识星球上面。

部分代码如下：

import tensorflow as tf

# loss weight
LAMBDA = 10


class InstanceNormalization(tf.keras.layers.Layer):
    """Instance Normalization Layer (https://arxiv.org/abs/1607.08022)."""

    def __init__(self, epsilon=1e-5):
        super(InstanceNormalization, self).__init__()
        self.epsilon = epsilon

    def build(self, input_shape):
        self.scale = self.add_weight(
            name='scale',
            shape=input_shape[-1:],
            initializer=tf.random_normal_initializer(0., 0.02),
            trainable=True)

        self.offset = self.add_weight(
            name='offset',
            shape=input_shape[-1:],
            initializer='zeros',
            trainable=True)

    def call(self, x):
        mean, variance = tf.nn.moments(x, axes=[1, 2], keepdims=True)
        inv = tf.math.rsqrt(variance + self.epsilon)
        normalized = (x - mean) * inv
        return self.scale * normalized + self.offset


def downsample(filters, size, norm_type='batchnorm', apply_norm=True):
    """Downsamples an input.
    Conv2D => Batchnorm => LeakyRelu
    Args:
      filters: number of filters
      size: filter size
      norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
      apply_norm: If True, adds the batchnorm layer
    Returns:
      Downsample Sequential Model
    """
    initializer = tf.random_normal_initializer(0., 0.02)

    result = tf.keras.Sequential()
    result.add(
        tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                               kernel_initializer=initializer, use_bias=False))

    if apply_norm:
        if norm_type.lower() == 'batchnorm':
            result.add(tf.keras.layers.BatchNormalization())
        elif norm_type.lower() == 'instancenorm':
            result.add(InstanceNormalization())

    result.add(tf.keras.layers.LeakyReLU())

    return result


def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
    """Upsamples an input.
    Conv2DTranspose => Batchnorm => Dropout => Relu
    Args:
      filters: number of filters
      size: filter size
      norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
      apply_dropout: If True, adds the dropout layer
    Returns:
      Upsample Sequential Model
    """

    initializer = tf.random_normal_initializer(0., 0.02)

    result = tf.keras.Sequential()
    result.add(
        tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
                                        padding='same',
                                        kernel_initializer=initializer,
                                        use_bias=False))

    if norm_type.lower() == 'batchnorm':
        result.add(tf.keras.layers.BatchNormalization())
    elif norm_type.lower() == 'instancenorm':
        result.add(InstanceNormalization())

    if apply_dropout:
        result.add(tf.keras.layers.Dropout(0.5))

    result.add(tf.keras.layers.ReLU())

    return result


def unet_generator(output_channels, norm_type='batchnorm'):
    """Modified u-net generator model (https://arxiv.org/abs/1611.07004).
    Args:
      output_channels: Output channels
      norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.
    Returns:
      Generator model
    """

    down_stack = [
        downsample(64, 4, norm_type, apply_norm=False),  # (bs, 128, 128, 64)
        downsample(128, 4, norm_type),  # (bs, 64, 64, 128)
        downsample(256, 4, norm_type),  # (bs, 32, 32, 256)
        downsample(512, 4, norm_type),  # (bs, 16, 16, 512)
        downsample(512, 4, norm_type),  # (bs, 8, 8, 512)
        downsample(512, 4, norm_type),  # (bs, 4, 4, 512)
        downsample(512, 4, norm_type),  # (bs, 2, 2, 512)
        downsample(512, 4, norm_type),  # (bs, 1, 1, 512)
    ]

    up_stack = [
        upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 2, 2, 1024)
        upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 4, 4, 1024)
        upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 8, 8, 1024)
        upsample(512, 4, norm_type),  # (bs, 16, 16, 1024)
        upsample(256, 4, norm_type),  # (bs, 32, 32, 512)
        upsample(128, 4, norm_type),  # (bs, 64, 64, 256)
        upsample(64, 4, norm_type),  # (bs, 128, 128, 128)
    ]

    initializer = tf.random_normal_initializer(0., 0.02)
    last = tf.keras.layers.Conv2DTranspose(
        output_channels, 4, strides=2,
        padding='same', kernel_initializer=initializer,
        activation='tanh')  # (bs, 256, 256, 3)

    concat = tf.keras.layers.Concatenate()

    inputs = tf.keras.layers.Input(shape=[None, None, 3])
    x = inputs

    # Downsampling through the model
    skips = []
    for down in down_stack:
        x = down(x)
        skips.append(x)

    skips = reversed(skips[:-1])

    # Upsampling and establishing the skip connections
    for up, skip in zip(up_stack, skips):
        x = up(x)
        x = concat([x, skip])

    x = last(x)

    return tf.keras.Model(inputs=inputs, outputs=x)


def discriminator(norm_type='batchnorm', target=True):
    """PatchGan discriminator model (https://arxiv.org/abs/1611.07004).
    Args:
      norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.
      target: Bool, indicating whether target image is an input or not.
    Returns:
      Discriminator model
    """

    initializer = tf.random_normal_initializer(0., 0.02)

    inp = tf.keras.layers.Input(shape=[None, None, 3], name='input_image')
    x = inp

    if target:
        tar = tf.keras.layers.Input(shape=[None, None, 3], name='target_image')
        x = tf.keras.layers.concatenate([inp, tar])  # (bs, 256, 256, channels*2)

    down1 = downsample(64, 4, norm_type, False)(x)  # (bs, 128, 128, 64)
    down2 = downsample(128, 4, norm_type)(down1)  # (bs, 64, 64, 128)
    down3 = downsample(256, 4, norm_type)(down2)  # (bs, 32, 32, 256)

    zero_pad1 = tf.keras.layers.ZeroPadding2D()(down3)  # (bs, 34, 34, 256)
    conv = tf.keras.layers.Conv2D(
        512, 4, strides=1, kernel_initializer=initializer,
        use_bias=False)(zero_pad1)  # (bs, 31, 31, 512)

    if norm_type.lower() == 'batchnorm':
        norm1 = tf.keras.layers.BatchNormalization()(conv)
    elif norm_type.lower() == 'instancenorm':
        norm1 = InstanceNormalization()(conv)

    leaky_relu = tf.keras.layers.LeakyReLU()(norm1)

    zero_pad2 = tf.keras.layers.ZeroPadding2D()(leaky_relu)  # (bs, 33, 33, 512)

    last = tf.keras.layers.Conv2D(
        1, 4, strides=1,
        kernel_initializer=initializer)(zero_pad2)  # (bs, 30, 30, 1)

    if target:
        return tf.keras.Model(inputs=[inp, tar], outputs=last)
    else:
        return tf.keras.Model(inputs=inp, outputs=last)


loss_obj = tf.keras.losses.BinaryCrossentropy(from_logits=True)


def discriminator_loss(real, generated):
    real_loss = loss_obj(tf.ones_like(real), real)

    generated_loss = loss_obj(tf.zeros_like(generated), generated)

    total_disc_loss = real_loss + generated_loss

    return total_disc_loss * 0.5


def generator_loss(generated):
    return loss_obj(tf.ones_like(generated), generated)


def calc_cycle_loss(real_image, cycled_image):
    loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))

    return LAMBDA * loss1


def identity_loss(real_image, same_image):
    loss = tf.reduce_mean(tf.abs(real_image - same_image))
    return LAMBDA * 0.5 * loss


if __name__ == "__main__":
    BATCH_SIZE = 10
    IMG_WIDTH = 256
    IMG_HEIGHT = 256
    INPUT_CHANNELS = 3
    OUTPUT_CHANNELS = 3

    generator_g = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
    generator_f = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')

    discriminator_x = discriminator(norm_type='instancenorm', target=False)
    discriminator_y = discriminator(norm_type='instancenorm', target=False)

    sample_apple = tf.random.normal([BATCH_SIZE, IMG_HEIGHT, IMG_WIDTH, INPUT_CHANNELS])
    sample_orange = tf.random.normal([BATCH_SIZE, IMG_HEIGHT, IMG_WIDTH, INPUT_CHANNELS])
    print(f"Inputs sample_apple.shape {sample_apple.shape}")
    print(f"Inputs sample_orange.shape {sample_orange.shape}")

    print(f"Pass by -----------------generator_g----------------------")
    print(f"Pass by -----------------generator_f----------------------")
    to_orange = generator_g(sample_apple)
    to_apple = generator_f(sample_orange)
    print(f"Outputs to_orange.shape {to_orange.shape}")
    print(f"Outputs to_apple.shape {to_apple.shape}")
    print("*"*100)
    print(f"Inputs sample_apple.shape {sample_apple.shape}")
    print(f"Inputs sample_orange.shape {sample_orange.shape}")
    print(f"Pass by -----------------discriminator_y----------------------")
    print(f"Pass by -----------------discriminator_x----------------------")
    disc_real_orange = discriminator_y(sample_orange)
    disc_real_apple = discriminator_x(sample_apple)
    print(f"Outputs disc_real_orange.shape {disc_real_orange.shape}")
    print(f"Outputs disc_real_apple.shape {disc_real_apple.shape}")

wormhacker

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
1
评论
基于Windows11系统和NVIDIA显卡的DeepNude无水印版本零基础复现教程

作为2023年最热门的计算机视觉（CV）项目之一，DeepNude引起了非常多技术发烧友和普通大众更大关注。这个项目使用MASK技术实现了人体图像特定遮挡物的去除与去除后的图像生成，采用了对抗神经网络（GAN）等多种值得学习的计算机视觉（CV）技术，值得学习。因此俺写了这篇DeepNude在普通人的Windows电脑上进行复现的教程。
复制链接

扫一扫