解读 Attention Guided Graph Convolutional Networks for Relation Extraction

最新推荐文章于 2022-11-25 10:51:58 发布

最新推荐文章于 2022-11-25 10:51:58 发布

关键词由CSDN通过智能技术生成

认认真真读完了这篇文献，虽然说，好像，用处不大，但还是想记录一下！（不然一周工作白费了。。。）写的比较简单且草率hhh ，主要是理了理文章思路，可能存在很多错别字以及低级错误，欢迎指出

Most existing relation extraction models can be categorized into two classes: sequence-based and dependency-based.

屏幕快照 2021-04-15 下午9.27.57.png

在这里插入图片描述屏幕快照 2021-04-15 下午9.34.24.png

屏幕快照 2021-04-15 下午9.40.38.png

屏幕快照 2021-04-15 下午9.41.42.png

屏幕快照 2021-04-15 下午9.41.22.png

屏幕快照 2021-04-15 下午9.41.11.png

屏幕快照 2021-04-15 下午9.48.51.png

屏幕快照 2021-04-15 下午9.53.46.png 屏幕快照 2021-04-15 下午9.53.53.png 在这里插入图片描述
屏幕快照 2021-04-15 下午10.08.18.png

Experiments

6987 ternary relation instances
6087 binary relation instances

 (N=2, M=2, L1=2, L2=4, dhidden=340)

 (N=3, M=2, L1=2, L2=4, dhidden=300)

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
解读 Attention Guided Graph Convolutional Networks for Relation Extraction

Attention Guided Graph Convolutional Networks for Relation Extraction第一次认认真真读完了一篇文献，而且扣完了source code，记录一下！写的比较简单hhhBackGroundMost existing relation extraction models can be categorized into two classes: sequence-based and dependency-based.1. equence-b
复制链接

扫一扫

解读 Attention Guided Graph Convolutional Networks for Relation Extraction

Background

1. sequence-based model：operate only on the word sequences

2. dependency-based：incorporate dependency trees into the models

（1）dependency tree

以“I like eating apple”这句话为例，可以利用stanfordnlp、nltk、ltp等工具，对其进行句法解析。得到依存关系树之后，利用数据结构的知识，就可以把一棵树转换为一个邻接矩阵了。

（2） pruning strategies

to distill the dependency information in order to further improve the performance

a. solution 1: rule-based

k=0 找到一条从root 根结点出发，能够连接实体的路径。（在这个例子中，实体包括EGFR、L858E、gefitinib三个token ）

k=1 在k=0的基础上，加入距离lca subtree路径为1的节点

rule-based pruning strategies might eliminate some important information in the full tree. 可以看到，这种基于规则的方式，可能会忽略掉一些对于预测有关的信息，比如上面例子中的“partial reaponse”

b. solution 2: attention mechanism

transforms the original dependency tree into a fully connected edgeweighted graph. These weights can be viewed as the strength of relatedness between nodes. 也就是这篇文章在做的事情，用attention更新边与边的权重，得到全连接邻接矩阵。

Model Part

the Whole Structure

整个模型由m个block组成，每个block中又分了三层，input为邻接矩阵以及word embedding、pos embedding和ner embedding concat的结果

其中，根据作者的描述，他的densely connected layer又如下图所示，由两个子layer+linear 层相连。

接下来，关于模型的每一层：

1. Attention Guided Layer

Q、K可以看作是上一个block得到的隐层特征h，在参考了一些blog后，关于公式里的W的下标，感觉应该是t而不是i。（如果是i的话，那么每个head得到的A都一样，multi-head就显得没有意义）

n个head得到了n个A，随后将传递给n个densely connected layer。

notice： The attention guided layer is included starting from the second block

2. Densely Connected Layer

allowing rich local and non-local information to be captured for learning a better graph representation.

(1)original gcn

对于这个公式，我的小小理解就是，利用i节点周围的邻居j的（l-1）层特征，可以计算得到i节点在第l层的特征。w和b是模型需要学习的参数。

(2) gcn in this paper

notice: d(hidden)=d(input)/L ; improves the parameter efficiency similarto DenseNets

3. Linear Combination Layer

hout：the output by concatenating outputs from N separate densely connected layers

4. AGGCNs for Relation Extraction

在得到最终的隐层特征后，作者又分别对sentence、subj_entity、obj_entity做了mask处理及max-pooling。

这里的mask处理，可以这么理解：

02.同理，对于entity，在max-pooling之前也需要把非entity的位置对应的特征值替换掉。

最后，把hsent、he concat起来，过线性层，映射到样本标记空间。

对于最终的目标函数作者选择的是crossentropy。

Experiments

1. cross-sentence n-ary relation extration

这里的n-ary是指三元、二元实体对。

（1）dataset:PubMed

（2） best parameters

2. sentence-level relation extraction

（1）dataset 1:TACRED 106K instances;41 rel types

（2）0dataset 2:Semeval-10 Task 8:10717 instances;9 types

（3） best parameters

3. ablation study

作者还进行了一个消融实验，可以看到attention机制对于关系的抽取起到了很大作用。

[1]: Attention Guided Graph Convolutional Networks for Relation Extraction

[2]: Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning

“相关推荐”对你有帮助么？