DeepMind one shot learning 论文批注 One-Shot Generalization in Deep Generative Models

原创 2016年08月30日 15:19:35

One-Shot Generalization in Deep Generative Models

Danilo J. RezendeShakir Mohamed Ivo Danihelka Karol Gregor Daan Wierstra

Google DeepMind,London

2. Varieties of Attention

Spatially-transformed attention

    A more powerful approach is to use amechanism that provides invariance to shape and size of objects in the images (generalaffine transformations).

Spatial transformers (ST) process an inputimage x, using parameters h, and generate anoutput y(x,h):

where κh and κw are 1-dimensional kernels, ⊗ indicates the tensor outer-product of the two kernels and ∗ indicates a convolution.

Inference,readingattention: spatial transformers allow the model toobserve the input image in a canonical form, providing the desired invariance

生成,writing attention:handle position, scale and rotation of parts of the generated image

3. Iterative and Attentive Generative Models

3.1. Latent Variable Models and Variational Inference



Generativemodels with latent variables:

DeepGenerative Models:A hierarchyof L layers of latent variables, where each layerdepends on the layer above in a non-linear way


We specifythisnon-linear dependency using deep neural netWorks.

To compute the marginal probability of the data, we must integrate over anyunobserved variables:


Deep latent Gaussian models深度潜在高斯模型

——prior distributionp(z)先验概率(关于潜变量z),高斯分布(分布已知,非线性组合)

Likelihoodfunction似然函数p(x|z),appropriate for the observed data, suchas a Gaussian, Bernoulli, categorical or other distribution, and that isdependent in a nonlinear way on the latent variables.

潜变量z数据点data points x


Variational inference: Transforms the difficult integration into an optimization problemthat is typically more scalable and easier to solve.
使用下界近似边缘似然Approximate the marginal likelihood by a lower bound

Negative free energy负自由能






Optimization parameters θ和φ:

amortizedinference approach 平摊推理方法

Represent the distribution q(z|x) as a recognition or inferencemodel

平摊后验推断posterior inference的开销 


Generativemodel:A decoder of the latent variables

Inference network: data —(encoder)—> latent description


Combination :

 deep latent variable model (typically latent Gaussian)

with variational inferencethat is implemented using an inference network is referred to as avariational auto-encoder (VAE).

3.2. Sequential Generative Models

目前提出的生成模型:刻画单步模型single step:


Asequential generative model:


Combine:stochastic & deterministic computation

Toform a multi-step generative process uses recursive transformations of thelatent variables.



Describe the observed data over T timesteps using a set of latent variables zt at each step.


Each step:

1.     Generate an independent set of K-dimensionallatent variables Zt (Stochasitc随机产生)

2.     函数Fh联系了相邻潜变量的依赖关系(类似LSTM)Deterministic

Fh transition function : LSTM network

3.     hidden canvas 隐画布:输入:LSTMfcallows for many different transformations, and it is here where generative(writing) attention is used.生成了(写)注意力

4.     Condition使用observationfunction fo(c; θo)计算

All parameters of this generative model asθ = {θh, θc, θo}.

3.2.2. Free Energy Objective

Objective function for inference andparameter learning

Optimize this objective function for the variationalparameters φ and the modelparameters θ, by stochastic gradientdescent using a mini-batch of data.

As with other VAEs, we use a single sampleof the latent variables generated from qφ(z|x)when computing the Monte Carlogradient.



Canvas transition function fc(ct−1,ht;θc)更新hiddencanvas状态:

使用非线性变换fw转换当前隐状态ht,然后和已存在的canvas Ct-1融合。

Hidden canvas:隐画布,与原始图像拥有同样 ,多个通道。

更新hidden canvas的两种方法

1.      Additive Canvas

在原画布上添加hidden state的转换fw(Ct-1,ht; θc)

2.      Gated Recurrent Canvas

使用Convolutional gatedrecurrent unit(CGRU)卷积门循环单元,提供非线性递归更新机制,类似于convolutional LSTMSs

Functionfw(ht; θw) is a writing function that is used by the canvas function to transformthe LSTM hidden state into the coordinate system of the hidden canvas.


这个映射可以使全部/部分连接,本文使用writing or generative attentionfunction

Final phase of the generative processtransforms

Hidden canvas CT—fo(c; θo)—>似然函数的参数


output function fo(c; θo) : 1*1卷积实现,当隐画布hidden canvas有不同尺寸时,使用CNN.

Transform the LSTM hidden state into the coordinatesystem of the hidden canvas.     



Inference network实现这个分布。

Each step:

1.      使用非线性变换fr生成一个关于输入图像和隐状态t-1的低维表示rt。

Reading function(与writing attention function配对)。

Reading function:Input image to be transformed into a new coordinate space that allows for easierinference computations.

Be implemented as a fully- orlocally-connected network,

Better inferenceis obtained using a reading or recognitionattention.





(1)隐画布函数的选择:在hidden space产生pre-image,最后反向转换到图像空间。

One of the largest deviations is the introduction of the hidden canvas into the generativemodel that provides an important richness to the model, since it allows apre-image to be constructed in a hidden space before a finalcorrective transformation, using the function fo,is used.


Inference network

shareparameters of the LSTM from the prior—the removal of this additional recursivefunction has no effect on performance.

4. Image Generation and Analysis

数据                  Data:      Binary images

像素概率模型  Modelthe probability of the pixels :         Bernoullilikelihood 伯努利分布

神经元单元      Units:                                  400 LSTMhidden units

空间变换          Spatial transformer :                  12x12 kernels, used for recognition orgenerative attention

潜变量              Latent variable Zt:             4-D Gaussian distributions

时间步长           Timestep:                           20-80

隐画布               Hiddencanvas:                 size of Image with4 channels

训练迭代           Approximativelyiterations:      800K

批量                    mini-batches                       24

训练集似然边界 likelihood bounds:         训练的最后1K次迭代的平均值

测试集似然边界 likelihood bounds:        24000个随机样本bound边界均值

4.1. MNIST and Multi-MNIST

Data Set

1.      The binarized MNIST data set of Salakhutdinov &Murray (2008)

28X28 binary imageswith 50,000 training and 10,000test images.

2.      Multi-MNIST data set

3.      64x64 images,two MNIST digits placed at random

Importance of each step

These results alsoindicate that longer sequences can lead to better performance。

The latent variableszt have diminishing contribution to the model as the number of steps grow.

Efficiently allocateand decide on the number of latent variables to use

4.2. Omniglot

The omniglot data set

105 X105 binaryimages across ;

1628 classes withjust 20 images per class.

4.3.  Multi-PIE

Multi-PIE dataset:

48X48 RGB faceimages from various viewpoints

convertedto grayscale

Trainedour model on a subset comprising of all 15-viewpoints but only 3 out of the 19illumination conditions.

93,130training samples and 10, 000 test samples

5.  One-Shot Generalization

Three tasks toevaluate one-shot generalization

(1) unconditional(free) generation,

(2) generation ofnovel variations of a given exemplar,

(3) generation ofrepresentative samples from a novel alphabet


Weak and one-shotgeneralization tests:


Training data consists of all available alphabets,3character types from each

alphabet were removedto form test set.

Strong one-shot generalization test




Multi-Cue Zero-Shot Learning with Strong Supervision阅读笔记CVPR2016收录

论文地址: 该论文被CVPR2016收录。当时,zero-shot learning的方法中,最好的依然是依靠着人工标注的属性。...

zero-shot learning 论文三篇小结

what is zero-shot learning zero-shot learning 是为了能够识别在测试中出现,而在训练中未遇到过的数据类别。例如识别一张猫的图片,但在训练时没有训练到猫的图...

one shoting learning

One-shot learning 一.One-shotlearning是最近在计算机视觉领域一种研究解决目标分类任务的方法。然而大多数基于目标分类算法的机器学习需要成百上千的图像或非常大的数据库用...

【平价数据】One Shot Learning

简介DeepMind解决小样本学习问题的文章:Matching Networks for One Shot Learning

READING NOTE: FastMask: Segment Multi-scale Object Candidates in One Shot

TITLE: FastMask: Segment Multi-scale Object Candidates in One Shot

单次自动对焦(ONE SHOT)、人工智能自动对焦(AI FOCUS)、人工智能伺服自动对焦(AI SERVO)的区别

单次自动对焦(one shot)是最为常用的。这种模式的工作过程通过半按快门来启动,在焦点未对准确前,对焦过程一直在继续。一旦处理器认为焦点准确以后,只要将快门完全按下就完成了一次拍摄过程,同时自动对...


使用 mmm 编译的时候 Android 执行的 ONE_SHOT make, 其如何实现? mmm 的源代码在 build/ 中 function mmm() 64...
  • span76
  • span76
  • 2014-03-12 17:55
  • 3135

论文学习1----理解深度学习需要重新思考泛化Understanding deep learning requires rethinking generalization

——论文地址:Understanding deep learning requires rethinking generalization1、有关新闻1.1 新闻一:参考1:机器之心尽管深度人工神经网...

『 论文阅读』Understanding deep learning requires rethinking generalization

虽然其规模巨大,但成功的深层人工神经网络可以获得训练和测试集非常小的性能差异。 传统知识认为这种小的泛化误差归功于模型的性能,或者是由于在训练的时候加入了正则化技术。 通过广泛的系统实验,我们展示了这...

论文笔记:unsupervised representation learning with deep convolutional generative adversarial networks

1. previous work [generative adversarial nets] paper link:  t...