论文: https://www.aclweb.org/anthology/P15-1061 或者 https://pan.baidu.com/s/1qFGhrMIO31s0pVvv0eJkMQ
机构: IBM
模型: CR-CNN
数据: SemEval2010-Task8
结果: F1值,84.1%
备注: 只利用了预训练好的词向量表示
输入: 一个包含两个标注实体的句子,例如 The [car] left the [plant]
输出: 一个向量,各维度代表各个关系的概率
过程:
-
Word Embeddings
通用的获取词向量流程:
预先训练好的embedding matrix 为: W w r d ∈ R d w × ∣ V ∣ W^{wrd} \in \mathbb{R}^{d_w \times |V|} Wwrd∈Rdw×∣V∣, d w d_w dw为词向量的维度, ∣ V ∣ |V| ∣V∣为单词总数
列向量 W i w r d ∈ R d w W^{wrd}_i \in \mathbb{R}^{d_w } Wiwrd∈Rdw 即为 第 i i i 个词对应的词向量词 w w w得到其词向量 r w r_w rw的公式为: r w = W w r d v w r_w=W^{wrd}v^w rw=Wwrdvw, v w v^w vw是一个 w w w维度为1,其余维度为0的总维度为 ∣ V ∣ |V| ∣V∣的向量
-
Word Position Embeddings
同 (2014COLING) Relation Classification via Convolutional Deep Neural Network 那篇论文中PF的获取方式
w p e w = [ w p 1 , w p 2 ] wpe^w = [wp_1, wp_2] wpew=[wp1,wp2], w p 1 wp_1 wp1和 w p 2 wp_2 wp2为 d w p e d_{wpe} dwpe维的向量
于是,句子 x x x转化为向量表示 e m b x emb_x embx= { [ r w 1 , w p e w 1 ] [r^{w_1},wpe^{w_1}] [rw1,wpew1], [ r w 2 , w p e w 2 ] [r^{w_2},wpe^{w_2}] [rw2,wpew2],…, [ r w N , w p e w N ] [r^{w_N},wpe^{w_N}] [rwN,wpewN]}
(备注:实例图中每个词的维度为 d w d_w dw,没有考虑position embeddings) -
Sentence Representation
利用CNN提取句子 x x x的特征向量 r x r_x rx
-
Class embeddings and Scoring [论文创新点]
普通方式为:将特征向量输入softmax classifier得到最终结果
本文的创新点为:模型为每种关系学习一个向量表示
W c l a s s e s W^{classes} Wclasses为关系对应的embedding matrix,每一列为一种关系对应的向量表示
关系c的向量表示即为 [ W c l a s s e s ] c [W^{classes}]_c [Wclasses]c,维度与句子 x x x的特征向量 r x r_x rx相同
于是,两向量相乘 r x T [ W c l a s s e s ] c r_x^T [W^{classes}]_c rxT[Wclasses]c,可以得到一个值
至此,模型为句子 x x x、关系 c c c学到一个值: s θ ( x ) c = r x T [ W c l a s s e s ] c s_\theta(x)_c=r_x^T [W^{classes}]_c sθ(x)c=rxT[Wclasses]c
其中, θ \theta θ表示模型的所有参数训练过程中,一个句子 x x x对应一个正例 y + y^+ y+和一个负例 c − c^- c−
y + y^+ y+是句子真正对应的关系类别, c − c^- c−则为其它关系类别中的一个
正例对应的值为 s θ ( x ) y + s_\theta(x)_{y^+} sθ(x)y+,负例对应的值为 s θ ( x ) c − s_\theta(x)_{c^-} sθ(x)c−
损失函数为 L = l o g ( 1 + e x p ( γ ( m + − s θ ( x ) y + ) ) ) + l o g ( 1 + e x p ( γ ( m − + s θ ( x ) c − ) ) ) L=log(1+exp(\gamma(m^+-s_\theta(x)_{y^+})))+log(1+exp(\gamma(m^-+s_\theta(x)_{c^-}))) L=log(1+exp(γ(m+−sθ(x)y+)))+log(1+exp(γ(m−+sθ(x)c−)))
m + m^+ m+和 m − m^- m−为margin值, γ \gamma γ为放大系数
训练过程中,使 s θ ( x ) y + s_\theta(x)_{y^+} sθ(x)y+逐渐大于 m + m^+ m+, s θ ( x ) c − s_\theta(x)_{c^-} sθ(x)c−逐渐小于 m − m^- m−原文:
- The proposed network learns a distributed vector representation for each relation class.
- Given an input text segment, the network uses a convolutional layer to produce a distributed vector representation of the text and compares it to the class representations in order to produce a score for each class.
- We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes.
BibTeX:
@inproceedings{dos-santos-etal-2015-classifying,
title = "Classifying Relations by Ranking with Convolutional Neural Networks",
author = "dos Santos, C{\'\i}cero and
Xiang, Bing and
Zhou, Bowen",
booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = jul,
year = "2015",
address = "Beijing, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P15-1061",
doi = "10.3115/v1/P15-1061",
pages = "626--634",
}