【文献解读】Generating Sentences from a Continuous Space，VAE产生连续空间变量

最新推荐文章于 2024-04-15 09:59:29 发布

LBWNB、

最新推荐文章于 2024-04-15 09:59:29 发布

阅读量1.8k

点赞数

文章标签：自然语言处理深度学习

本文链接：https://blog.csdn.net/qq_38356492/article/details/109834558

版权

本文介绍了一种基于RNN的变分自编码器（VAE）模型，用于生成连贯的新句子并捕获全局语义特征。模型克服了标准RNN无法捕捉如主题或高级句法结构等全局信息的缺点。通过使用连续的潜在变量，该模型在缺失单词的补全任务中表现出色，并能通过确定性解码生成多样化的句子。

摘要由CSDN通过智能技术生成

The standard RNN: doesn’t work for global sentence representation;
The work of this paper: RNN-based variational autoencoder generative model;
The model: incorporates distributed latent representations of entire sentences; good at modeling holistic properties;
the model can generate coherent novel sentences that interpolate between known sentences;

RNN works well in unsupervised generative modeling for natural language sentences; like in machine translation and image captioning;
Shortcoming of RNN: cannot capture the global information like topic or high level syntactic properties.
What this paper did: extension of the RNN which capture global features in a continuous latent variable;
Inspiration of this paper from: using the architecture of a variational autoencoder and takes advantage of recent advances in variational inference (Kingma and Welling, 2015; Rezende et al., 2014); in these paper, they introduce generative models with latent variables;
The contributions of the paper:
- a VAE architecture for text and the problems while training.
- The model yields similar performance to existing RNN if the global variable is not explicitly needed;
- Task of imputing missing words: this paper introduced a novel evaluation strategy using adversarial classifier; sidestepping the issue of intractable likelihood computations by drawing inspiration from work on non-parametric two-sample tests and adversarial training;
- In this setting, the global latent variable allows the model to work well;
- Introduce several qualitative techniques evaluating ability to learn high level features of sentences;
- The model can produce diverse, coherent sentences through purely deterministic decoding and that they can interpolate smoothly between sentences.

Standard RNN doesn’t learn a vector representation of the full sentence;
In order to incorporate a continuous latent sentence representation, we need a method to map between sentences and distributed representations, which can be trained in an unsupervised setting;
Sequence autoencoders include: encoder function $\phi_{enc}$ and a probabilistic decoder $p(x|\vec z=\phi_{enc}(x))$ ; where $\vec z$ is the learned code $x$ is a given example; the decoder need to maximize the likelihood of the an example $x$ conditioned on $\vec z$ ; both encoder and decoder are RNNs and examples are token sequences;
But standard autoencoders are not effective at extracting for global semantic features; it can not learn a smooth, interpretable features for sentence encoding; the model do not incorporate a prior over $\vec z$ ;
Skip-thought models: same structure as autoencoder, but generate text conditioned on a neighboring sentence from the target text; instead the target sentence itself;
Paragraph vector models: non-recurrent sentence representation models

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-y8iHmVhS-1605794628063)(https://i.loli.net/2020/11/17/C1HLey89agIG4hl.png)]

the hidden code: with the Gaussian prior acting as a regularizer, and incorporates no useful information (as noise);
decoder: serve as a special RNN conditioned on the hidden code;
Explore several variations on the architecture:
- concatenating the sampled $\vec z$ to the decoder input at every time step
- using a soft-plus parametrization for the variance
- using deep feedforward networks between the encoder and latent variable and the decoder and latent variable
- Result: little difference in model’s performance;
Some work about VAE on discrete sequences has been done:
- VRAE(Variational Recurrent Autoencoder) for modeling music;
Some work on including continuous latent variables in RNN-style models:
- Modeling speech, handwriting and music;
- these models include separate latent variables per timestep; unsuitable for our task;
Similar work to this paper:
- VAE-based document-level language model that models texts as bags of words, rather than sequences;

The goal of this paper is to learn global latent representations of sentence content.

Q: How can we estimate the global features learned?

A: By looking at the variational lower bound objective.

关注