[论文阅读笔记28]Deep Biaffine Attention for Neural Dependency Parsing

最新推荐文章于 2024-03-31 09:49:46 发布

happyprince

最新推荐文章于 2024-03-31 09:49:46 发布

阅读量2.1k

点赞数 5

分类专栏： NER 文章标签：人工智能神经网络深度学习

本文链接：https://blog.csdn.net/ld326/article/details/115012759

版权

NER 专栏收录该内容

39 篇文章 13 订阅

订阅专栏

题目

Deep Biaffine Attention for Neural Dependency Parsing

论文：https://arxiv.org/pdf/1611.01734.pdf

代码：https://github.com/tdozat/Parser-v1
https://github.com/bamtercelboo/PyTorch_Biaffine_Dependency_Parsing

作者

Timothy Dozat
Stanford University 斯坦福大学
Christopher D. Manning
Stanford University 斯坦福大学

摘要

主是是研究依存句法分析方法。

解决基于图的依存句法分析两个问题：

1、哪两个节点连依存弧；

2、弧的标签是什么；

提出了biaffifine classififiers去预测arcs及它的labels;

模型

提出的模型是基于论文【3】【4】【5】进行的修改。

第一点，模型使用了biaffine attention,而不是bilinear 或传统的基于MLP attetion;

第二点，使用了biaffifine依赖标签分类器；

第三点，在应用Biaffine变换之前，我们将降维MLP应用于每个递归输出向量r_i;

从上图可以看出输入为词与词性向量拼接后的向量；经过bi-LSTM获得r_i; r_i经过两个MLPs,计算得到两隐变量h_arc-dep,h_arc-head; 最后一层h_arc-head还拼接了一个单位向量，利用U进行仿射变换，最后得到S，即arc分数矩阵，具体的公式理解如下：

biaffine并不是双线性(bilinear)或MLP机制，它使用一个仿射变换在单个LSTM输出状态r预测所有类别上的得分，提出的双仿射注意力机制(Biaffine Attention)可以看成为传统的仿射分类器：

基于上式改进为：

W采用多层LSTM堆叠计算后的结果，维度变为（d x d），b参数数也变成（d x 1）的向量；

由于句子中的词数是不定的，可是又要给每个词一个分数，这是一个不定类别分分类问题，上式(1)显然是满足不了，本文采用了两个MPLs来解决这个问题，把式（4）式（5）代入式（2）就得到最终的式（6）：

它的作用是使数据进行降维输出处理，可以对LSTM进行降维，然后输入到仿射层，避免过拟合。

双仿射分类器使用双线性层，比传统使用两层线性层和一个非线性激活单元的MLP网络更简单。同时，arc双仿射分离器对两种概率直接建模：

，结点j接受任意依赖的先验概率；

，结点j接受单词i依赖的概率；

使用另一个label双仿射分类器预测单词与其头结点间的依赖标签：

U的维度为mxdxd的高维张量（m是标签个数，d是biaffine的输入维度）；

arc分类器是不定类别分类器，类别数与序列长度有关，label分类器是固定类别分类器，类别数等于所有可能的依存关系数。

实验

总结

参考

【1】Deep Biaffine Attention for Neural Dependency Parsing，http://www.hankcs.com/nlp/parsing/deep-biaffine-attention-for-neural-dependency-parsing.html

【2】Deep Biaffine Attention for Dependency Parsing，https://zhuanlan.zhihu.com/p/71553871

【3】Eliyahu Kiperwasser and Yoav Goldberg. Simple and accurate dependency parsing using bidirectional LSTM feature representations. *Transactions of the Association for Computational Linguistics, 4:313–327, 2016.

【4】Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. A joint many-task model: Growing a neural network for multiple nlp tasks. arXiv preprint arXiv:1611.01587, 2016.

【5】 Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, and Li Deng. Bi-directional attention with agreement for dependency parsing. arXiv preprint arXiv:1608.02076, 2016.

happyprince.https://blog.csdn.net/ld326/article/details/115012759

happyprince

关注

5
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
[论文阅读笔记28]Deep Biaffine Attention for Neural Dependency Parsing

题目Deep Biaffine Attention for Neural Dependency Parsing论文：https://arxiv.org/pdf/1611.01734.pdf代码：https://github.com/tdozat/Parser-v1https://github.com/bamtercelboo/PyTorch_Biaffine_Dependency_Parsing作者Timothy DozatStanford University 斯坦福大学Christop
复制链接

扫一扫