知识点
Contribution
- We present multimodal social relation datasets, which can facilitate future research on multimodal SRE. 提出一个新数据集
- To leverage both texts and face images, we propose a novel approach FL-MSRE for SRE。为了同时利用文本和人脸,提出一个新方法FL-MSRE来进行社交关系识别
- Extensive experiments demonstrate that FL-MSRE is effective in SRE from texts and face images。实验顶
数据集构建
- 因为为图片数据集补充文本很困难,因此考虑为文本数据集补充图片。
- 句子:提取至少提到两个人的句子,并至少两个人支持
- 图片:提取至少包含两个人的 image
- 只保留细粒度的关系。如 family(no),father(yes)
- 最后将数据集分为三部分:DRC-TF(15 rels),OM-TF(9 rels),FC-TF(24 rels)。
- Every social relation is supported by multiple triples; every triple is supported by multiple pairs of face images about the two entities in the triple as well as by multiple sentences mentioning both the entities.
Model
- N way (relations) K shot (tupels) setting 来预测另外的 N tuples on the same N social relations
- 网络结构,前面简单的多模态特征抽取和拼接(text 特征用 bert 提取,人脸用facenet,最后两特征 concanate。可能没抄别人,自己提的),后面用的别人的网络( prototypical network [Snell, Swersky, and Zemel 2017b)],代码都一模一样)。
- Prototypical Network
先计算关系 rm 的 prototype representation
预测时,分别计算每个 query 和 rm 的欧式距离
实验
- 只有一个对比方法 BERT;分别在三个数据集上;4种 N way K shot
github代码
- loss和accuracy的计算参考论文: 《FewRel 2.0: Towards More Challenging Few-Shot Relation Classification》,Prototypical Network代码和这篇论文的代码一模一样。主要在于理解NOTA