来源:新智元
本文为你介绍近日DeepMind发布的预训练的BigBiGAN表示学习模型。
[ 导读 ]近日,DeepMind发布了预训练的BigBiGAN表示学习模型,该模型基于最先进的 BigGAN 模型构建,通过添加编码器和修改鉴别器将其扩展到表示学习。
DeepMind表示学习模型BigBiGAN终于开源了!
近日,DeepMind发布了预训练的BigBiGAN表示学习模型,开源代码可以在TF Hub上找到。
BigBiGAN 模型基于DeepMind 最先进的 BigGAN 模型构建,通过添加编码器和修改鉴别器将其扩展到表示学习。这篇题为《大规模对抗性表示学习》的论文在发布时受到很大关注。
BigBiGAN代码开源,TensorFlow实现
module = hub.Module('https://tfhub.dev/deepmind/bigbigan-resnet50/1')
# Sample a batch of 8 random latent vectors (z) from the Gaussian prior. Then
# call the generator on the latent samples to generate a batch of images with
# shape [8, 128, 128, 3] and range [-1, 1].
z = tf.random.normal([8, 120]) # latent samples
gen_samples = module(z, signature='generate')
# Given a batch of 256x256 RGB images in range [-1, 1], call the encoder to
# compute predicted latents z and other features (e.g. for use in downstream
# recognition tasks).
images = tf.placeholder(tf.float32, shape=[None, 256, 256, 3])
features = module(images, signature='encode', as_dict=True)
# Get the predicted latent sample `z_sample` from the dict of features.
# Other available features include `avepool_feat` and `bn_crelu_feat`, used in
# the representation learning results.
z_sample = features['z_sample'] # shape [?, 120]
# Compute reconstructions of the input `images` by passing the encoder's output
# `z_sample` back through the generator. Note that raw generator outputs are
# half the resolution of encoder inputs (128x128). To get upsampled generator
# outputs matching the encoder input resolution (256x256), instead use:
# recons = module(z_sample, signature='generate', as_dict=True)['upsampled']
recons = module(z_sample, signature='generate') # shape [?, 128, 128, 3]
# Load BigBiGAN module.
module = hub.Module('https://tfhub.dev/deepmind/bigbigan-revnet50x4/1')
# Sample a batch of 8 random latent vectors (z) from the Gaussian prior. Then
# call the generator on the latent samples to generate a batch of images with
# shape [8, 128, 128, 3] and range [-1, 1].
z = tf.random.normal([8, 120]) # latent samples
gen_samples = module(z, signature='generate')
# Given a batch of 256x256 RGB images in range [-1, 1], call the encoder to
# compute predicted latents z and other features (e.g. for use in downstream
# recognition tasks).
images = tf.placeholder(tf.float32, shape=[None, 256, 256, 3])
features = module(images, signature='encode', as_dict=True)
# Get the predicted latent sample `z_sample` from the dict of features.
# Other available features include `avepool_feat` and `bn_crelu_feat`, used in
# the representation learning results.
z_sample = features['z_sample'] # shape [?, 120]
# Compute reconstructions of the input `images` by passing the encoder's output
# `z_sample` back through the generator. Note that raw generator outputs are
# half the resolution of encoder inputs (128x128). To get upsampled generator
# outputs matching the encoder input resolution (256x256), instead use:
# recons = module(z_sample, signature='generate', as_dict=True)['upsampled']
recons = module(z_sample, signature='generate') # shape [?, 128, 128, 3]
基于 BigGAN 打造 BigBiGAN:学习高级语义,而非细节
-
我们证明了 BigBiGAN (BiGAN with BigGAN generator) 与 ImageNet 上无监督表示学习的最先进技术相匹敌。
-
我们为 BigBiGAN 提出了一个更稳定的联合鉴别器。
-
我们对模型设计选择进行了全面的实证分析和消融研究。
-
我们证明,表示学习目标还有助于无条件生成图像,并展示了无条件生成 ImageNet 的最先进结果。
BigBiGAN 框架的结构
表 1:多个 BigBiGAN 变体的性能结果,在生成图像的初始分数(IS)和 Fréchet 初始距离(FID),监督式逻辑回归分类器 ImageNet top-1 精度百分比(Cls。)由编码器特征训练,并基于从训练集中随机抽样的 10K 图像进行分割计算,我们将其称为 “train-val” 分割。
表 2:在官方 ImageNet 验证集上对 BigBiGAN 模型与最近的基于监督式逻辑回归分类器的其他方法的对比。
表 3:我们的 BigBiGAN 与无监督(无条件)生成方法、以及之前报告的无监督 BigGAN 的性能结果对比。
图 2:从无监督的 BigBiGAN 模型中选择的图像重建结果。上面一行的图像是真实图像(x~Px),下面一行图像是由 G(E(x))计算出的这些图像的重建结果。与大多数显式重建成本(例如像素数量)不同,由(Big)BiGAN 实现隐式最小化的重建成本更多倾向于强调图像的语义及其他更高级的细节。
论文链接:
https://arxiv.org/pdf/1907.02544.pdf
预训练模型地址:
https://tfhub.dev/s?publisher=deepmind&q=bigbigan
编辑:WW
校对:林亦霖