Abstract
-
The standard RNN: doesn’t work for global sentence representation;
-
The work of this paper: RNN-based variational autoencoder generative model;
-
The model: incorporates distributed latent representations of entire sentences; good at modeling holistic properties;
-
the model can generate coherent novel sentences that interpolate between known sentences;
Introduction
- RNN works well in unsupervised generative modeling for natural language sentences; like in machine translation and image captioning;
- Shortcoming of RNN: cannot capture the global information like topic or high level syntactic properties.
- What this paper did: extension of the RNN which capture global features in a continuous latent variable;
- Inspiration of this paper from: using the architecture of a variational autoencoder and takes advantage of recent advances in variational inference (Kingma and Welling, 2015; Rezende et al., 2014); in these paper, they introduce generative models with latent variables;
- The contributions of the paper:
- a VAE architecture for text and the problems while training.
- The model yields similar performance to existing RNN if the global variable is not explicitly needed;
- Task of imputing missing words: this paper introduced a novel evaluation strategy using adversarial classifier; sidestepping the issue of intractable likelihood computations by drawing inspiration from work on non-parametric two-sample tests and adversarial training;
- In this setting, the global latent variable allows the model to work well;
- Introduce several qualitative techniques evaluating ability to learn high level features of sentences;
- The model can produce diverse, coherent sentences through purely deterministic decoding and that they can interpolate smoothly between sentences.
Background
Unsupervised sentence encoding
-
Standard RNN doesn’t learn a vector representation of the full sentence;
-
In order to incorporate a continuous latent sentence representation, we need a method to map between sentences and distributed representations, which can be trained in an unsupervised setting;
-
Sequence autoencoders include: encoder function ϕ e n c \phi_{enc} ϕenc and a probabilistic decoder p ( x ∣ z ⃗ = ϕ e n c ( x ) ) p(x|\vec z=\phi_{enc}(x)) p(x∣z=ϕenc(x)); where z ⃗ \vec z z is the learned code x x x is a given example; the decoder need to maximize the likelihood of the an example x x x conditioned on z ⃗ \vec z z; both encoder and decoder are RNNs and examples are token sequences;
-
But standard autoencoders are not effective at extracting for global semantic features; it can not learn a smooth, interpretable features for sentence encoding; the model do not incorporate a prior over z ⃗ \vec z z;
-
Skip-thought models: same structure as autoencoder, but generate text conditioned on a neighboring sentence from the target text; instead the target sentence itself;
-
Paragraph vector models: non-recurrent sentence representation models
The variational autoencoder
- Based on a regularized version of the standard autoencoder
A VAE for sentences
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-y8iHmVhS-1605794628063)(https://i.loli.net/2020/11/17/C1HLey89agIG4hl.png)]
-
the hidden code: with the Gaussian prior acting as a regularizer, and incorporates no useful information (as noise);
-
decoder: serve as a special RNN conditioned on the hidden code;
-
Explore several variations on the architecture:
-
concatenating the sampled z ⃗ \vec z z to the decoder input at every time step
-
using a soft-plus parametrization for the variance
-
using deep feedforward networks between the encoder and latent variable and the decoder and latent variable
-
Result: little difference in model’s performance;
-
-
Some work about VAE on discrete sequences has been done:
- VRAE(Variational Recurrent Autoencoder) for modeling music;
-
Some work on including continuous latent variables in RNN-style models:
- Modeling speech, handwriting and music;
- these models include separate latent variables per timestep; unsuitable for our task;
-
Similar work to this paper:
- VAE-based document-level language model that models texts as bags of words, rather than sequences;
Optimization challenges
The goal of this paper is to learn global latent representations of sentence content.
Q: How can we estimate the global features learned?
A: By looking at the variational lower bound objective.
- The objec