【文献解读】Generating Sentences from a Continuous Space,VAE产生连续空间变量

Abstract

  • The standard RNN: doesn’t work for global sentence representation;

  • The work of this paper: RNN-based variational autoencoder generative model;

  • The model: incorporates distributed latent representations of entire sentences; good at modeling holistic properties;

  • the model can generate coherent novel sentences that interpolate between known sentences;

Introduction

  • RNN works well in unsupervised generative modeling for natural language sentences; like in machine translation and image captioning;
  • Shortcoming of RNN: cannot capture the global information like topic or high level syntactic properties.
  • What this paper did: extension of the RNN which capture global features in a continuous latent variable;
  • Inspiration of this paper from: using the architecture of a variational autoencoder and takes advantage of recent advances in variational inference (Kingma and Welling, 2015; Rezende et al., 2014); in these paper, they introduce generative models with latent variables;
  • The contributions of the paper:
    • a VAE architecture for text and the problems while training.
    • The model yields similar performance to existing RNN if the global variable is not explicitly needed;
    • Task of imputing missing words: this paper introduced a novel evaluation strategy using adversarial classifier; sidestepping the issue of intractable likelihood computations by drawing inspiration from work on non-parametric two-sample tests and adversarial training;
    • In this setting, the global latent variable allows the model to work well;
    • Introduce several qualitative techniques evaluating ability to learn high level features of sentences;
    • The model can produce diverse, coherent sentences through purely deterministic decoding and that they can interpolate smoothly between sentences.

Background

Unsupervised sentence encoding
  • Standard RNN doesn’t learn a vector representation of the full sentence;

  • In order to incorporate a continuous latent sentence representation, we need a method to map between sentences and distributed representations, which can be trained in an unsupervised setting;

  • Sequence autoencoders include: encoder function ϕ e n c \phi_{enc} ϕenc and a probabilistic decoder p ( x ∣ z ⃗ = ϕ e n c ( x ) ) p(x|\vec z=\phi_{enc}(x)) p(xz =ϕenc(x)); where z ⃗ \vec z z is the learned code x x x is a given example; the decoder need to maximize the likelihood of the an example x x x conditioned on z ⃗ \vec z z ; both encoder and decoder are RNNs and examples are token sequences;

  • But standard autoencoders are not effective at extracting for global semantic features; it can not learn a smooth, interpretable features for sentence encoding; the model do not incorporate a prior over z ⃗ \vec z z ;

  • Skip-thought models: same structure as autoencoder, but generate text conditioned on a neighboring sentence from the target text; instead the target sentence itself;

  • Paragraph vector models: non-recurrent sentence representation models

The variational autoencoder
  • Based on a regularized version of the standard autoencoder

A VAE for sentences

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-y8iHmVhS-1605794628063)(https://i.loli.net/2020/11/17/C1HLey89agIG4hl.png)]

  • the hidden code: with the Gaussian prior acting as a regularizer, and incorporates no useful information (as noise);

  • decoder: serve as a special RNN conditioned on the hidden code;

  • Explore several variations on the architecture:

    • concatenating the sampled z ⃗ \vec z z to the decoder input at every time step

    • using a soft-plus parametrization for the variance

    • using deep feedforward networks between the encoder and latent variable and the decoder and latent variable

    • Result: little difference in model’s performance;

  • Some work about VAE on discrete sequences has been done:

    • VRAE(Variational Recurrent Autoencoder) for modeling music;
  • Some work on including continuous latent variables in RNN-style models:

    • Modeling speech, handwriting and music;
    • these models include separate latent variables per timestep; unsuitable for our task;
  • Similar work to this paper:

    • VAE-based document-level language model that models texts as bags of words, rather than sequences;
Optimization challenges

The goal of this paper is to learn global latent representations of sentence content.

Q: How can we estimate the global features learned?

A: By looking at the variational lower bound objective.

  • The objec
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值