【P12】Semantic Parsing with Semi-Supervised Sequential Autoencoders

最新推荐文章于 2023-12-27 18:07:07 发布

二叉树不是树_ZJY

最新推荐文章于 2023-12-27 18:07:07 发布

阅读量169

点赞数

分类专栏：自然语言处理 paper 文章标签：机器学习深度学习无监督学习

本文链接：https://blog.csdn.net/qq_42341984/article/details/114005174

版权

自然语言处理同时被 2 个专栏收录

21 篇文章 3 订阅

订阅专栏

paper

19 篇文章 0 订阅

订阅专栏

Semantic Parsing with Semi-Supervised Sequential Autoencoders

1 Introduction
2 Model
3 Tasks and Data Generation
4 Experiments
5 Discussion

1 Introduction

背景：常规的监督学习是学习从输入序列x到输出序列y的映射，一般 $y$ （大量无监督数据）很容易获得，而 $(x, y)$ 形式的有监督数据难以获得。

本文提出了一种新的半监督训练架构，以适应序列转导任务（sequence transduction tasks）。即用自动编码目标 $(y \to x \to y)$ 增强转导目标 $(x \to y)$ ，其中输入序列 $x$ 被视为一个隐变量（latent variable）。

2 Model

在这里插入图片描述
该模型由四个LSTM组成，因此被称为SEQ4。

双向LSTM：对序列y进行编码；
一个具有随机输出的LSTM：在词汇Σx上绘制一个分布 $\tilde{x}$ 的序列；
LSTM：对这些分布进行编码；
LSTM：将 $y$ 重构为 $\tilde{y}$ 。

2.1 Encoding $y$

$h_{t}^{y}=\left(f_{y}^{\rightarrow}\left(y_{t}, h_{t-1}^{y, \rightarrow}\right) ; f_{y}^{\leftarrow}\left(y_{t}, h_{t+1}^{y, \leftarrow}\right)\right)$

2.2 Predicting a Latent Sequence $\tilde{x}$

通过词汇表 $\Sigma x$ 上的多项分布（multinomial distributions）来预测离散的符号序列不能执行反向传播，因此，在 $\Sigma x$ 上预测一个分布为 $\tilde{x}$ 的序列：
$\tilde{x}=q(x \mid y)=\prod_{t=1}^{L_{x}} q\left(\tilde{x}_{t} \mid\left\{\tilde{x}_{1}, \cdots, \tilde{x}_{t-1}\right\}, h^{y}\right)$ 其中， $\tilde{x}_{t}$ 是词汇 $\Sigma x$ 上的分布， $\Sigma x$ 来自于一个logistic正态分布。

图2中下半部分的虚线框计算过程为：
$\begin{aligned} h_{t}^{\tilde{x}} &=f_{\tilde{x}}\left(\tilde{x}_{t-1}, h_{t-1}^{\tilde{x}}, h^{y}\right) \\ \mu_{t}, \log \left(\sigma_{t}^{2}\right) &=l\left(h_{t}^{\tilde{x}}\right) \\ \epsilon & \sim \mathcal{N}(0, I) \\ \gamma_{t} &=\mu_{t}+\sigma_{t} \epsilon \\ \tilde{x}_{t} &=\operatorname{softmax}\left(\gamma_{t}\right) \end{aligned}$ 其中， $f_{\tilde{x}}$ 是LSTM， $l$ 是线性变换。

2.3 Encoding $x$

图2中上半部分的虚线框计算过程为：
$h_{t}^{x}=\left(f_{x}^{\rightarrow}\left(\tilde{x}_{t}, h_{t-1}^{x, \rightarrow}\right) ; f_{x}^{\leftarrow}\left(\tilde{x}_{t}, h_{t+1}^{x, \leftarrow}\right)\right)$

2.4 Reconstructing $y$

在最后一个LSTM，解码得到 $y$ ：
$p(\hat{y} \mid \tilde{x})=\prod_{t=1}^{L_{y}} p\left(\hat{y}_{t} \mid\left\{\hat{y}_{1}, \cdots, \hat{y}_{t-1}\right\}, h^{\tilde{x}}\right)$ 具体过程为： $\begin{array}{l} h_{t}^{\hat{y}}=f_{\hat{y}}\left(\hat{y}_{t-1}, h_{t-1}^{\hat{y}}, h^{\tilde{x}}\right) \\ \hat{y}_{t} \sim \operatorname{softmax}\left(l^{\prime}\left(h_{t}^{\hat{y}}\right)\right) \end{array}$ 在training时feed的是ground truth ${y}_{t-1}$ 而不是 $\hat{y}_{t-1}$ 。

2.5 Loss function

SEQ4完整模型给出了一个重建函数 $\hat{y})$ 。在此重构上定义了一个损失，适用于Unsupervised case，其中 $x$ 在训练数据中没有被观察到，而Supervised case，其中(x, y)对是可用的。二者一起构成半监督训练，实验表明比纯监督训练效果更好。

Unsupervised case
当 $x$ 没有被观察到时，我们在训练中最小化的损失是 $y$ 上的重建损失，表示为真实标签 $y$ 相对于预测 $\hat{y}$ 的负对数似然（negative loglikelihood） $NLL(\hat{y}, y)$ 。
unsupervised loss： $\mathcal{L}_{\text {unsup }}=N L L(\hat{y}, y)+\alpha K L[q(\gamma \mid y) \| p(\gamma)]$
其中， $L[q(\gamma \mid y) \| p(\gamma)]=\sum_{i=1}^{L_{x}} KL\left[q\left(\gamma_{i} \mid y\right) \| p(\gamma)\right]$ 这有平滑 logistic-normal distribution 的效果，我们从中绘制 $x$ 的符号上的分布，防止将 $x$ 上的潜在分布过度拟合到下面讨论的Supervised case的符号上。
Supervised case
当 $x$ 被观察到时，我们额外地最小化 $x$ 上的预测损失，表示为真实标签 $x$ 相对于预测 $\tilde{x}$ 的负对数似然 $NLL(\tilde{x}, x)$ ，并且不强加KL损失。
supervised loss： $\mathcal{L}_{\text {sup }}=NLL(\tilde{x}, x)+NLL(\hat{y}, y)$
Semi-supervised training and inference
用上述监督和非监督损失的加权组合进行训练。推理时只需使用模型的 $(x \to y)$ 解码器部分。当解码器在没有编码器的情况下以完全监督的方式进行训练时即退化为Seq2Seq baseline。

3 Tasks and Data Generation

4 Experiments

5 Discussion

Semi-supervised training：
表8显示了一些将无监督逻辑形式转换为自然语言的示例，演示了模型如何学习合理地处理无监督数据：
在这里插入图片描述

二叉树不是树_ZJY

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
【P12】Semantic Parsing with Semi-Supervised Sequential Autoencoders

Semantic Parsing with Semi-Supervised Sequential Autoencoders1 Introduction2 Model1 Introduction在本文中，我们专注于学习从输入序列x到输出序列y的映射，在这些领域中，后者很容易获得，但以(x，y)对形式的注释是稀疏或昂贵的，并提出了一种新的架构，以适应序列转导任务的半监督训练。为此，我们用自动编码目标增强了转导目标(x 7 y)，其中输入序列被视为一个潜伏变量(y 7 x 7 y)，从而可以从标记的对和未配对
复制链接

扫一扫