论文记录：Visual Relationship Detection with Deep Structural Ranking (AAAI-18)

最新推荐文章于 2024-08-09 15:32:31 发布

chenhch8

最新推荐文章于 2024-08-09 15:32:31 发布

阅读量1.2k

点赞数 2

文章标签：论文阅读

本文链接：https://blog.csdn.net/deepinC/article/details/86418488

版权

论文《Visual Relationship Detection with Deep Structural Ranking (AAAI-18)》提出了一种名为Deep Structural Ranking的方法，针对视觉关系检测中的大规模关系搜索空间和不完全标注问题。该方法结合了视觉外观、空间位置和语义嵌入线索，通过结构化排名损失函数来处理多关系检测，缓解不完整性问题。尽管忽视了全局上下文信息，但该方法通过允许多个关系并引入新的损失函数提高了实验效果。

摘要由CSDN通过智能技术生成

（这里只是记录了论文的一些内容以及自己的一点点浅薄的理解，具体实验尚未恢复。由于本人新人一枚，若有错误以及不足之处，还望不吝赐教）

总结

两大挑战：
1. different from individual object learning tasks, the number of possible relationships are much larger, which makes it hard to explore only based on the visual appearance of objects.
  - 假设图片上有 $N$ 个物体，共有 $K$ 种关系，则对应的三元组数量有 $O(N^2K)$ ，搜索空间大
2. 视觉关系的标注通常是不完全的，这使得模型的训练和评估变得很困难
  - 图片上仅标注了部分objects
  - some pairs of objects are not annotated with any predicates even they do have a relationship
  - in most cases, only one predicate is defined for an annotated object pair even though the co-occurrence of the predicates are very common
本文提出了一种称为“Deep Structural Ranking”的方法。与传统的仅考虑单视觉关系的方法不同，该方法能够处理多视觉关系检测，可用于促进 the co-occurrence of relationships 和减轻 the incompleteness problem。该方法将多种线索作为输入用于解决谓词的差异性：visual appearance cue, spatial location cue, semantic embedding cue
contributions：
1. 提出“structural ranking loss”用于解决多关系视觉检测问题
2. 结合条件概率的方法来减少不完全标注所带来的影响
缺点：
1. 论文提出的方法忽略了关系三元组的全局上下文信息，而上下文线索可以减少关系的模糊性以及更好地概括新关系
优点：
1. 去除了每个实体对间至多仅存在一种关系的假设，通过改变 loss function，即引入 structural ranking loss 函数，变成多关系检测，提高实验效果

模型框架

在这里插入图片描述

三种 cue:
- Visual Appearance Cue: 使用 VGG16 将图片变成 feature maps，然后利用物体在原图像上位置，在feature maps上将对应的物体特征截取出来，即所谓的 RoI(Region of Interest) Pooling features。使用该方式提取视觉特征的好处是能够减少计算量，即一张图片仅需进行一次 VGG16 。对于一个 relationship instance $(s, p, o)$ ，分别截取特征各自的视觉特征。注意， $p$ 的视觉特征即为 $s$ 和 $o$ 相交的公共区域的特征，该特征获取后直接拼接到 subject 和 object 各自的视觉特征上
- Spatial Location Cue:
  - spatial masks: 一个二值化图像，bounding box内的元素值为 0，其余为 1
  - relative location feature: 具有 scale-invariant，是一个 $(l_x,l_y,l_w,l_h) \in R^4$ 。每个 relationship instance 的 subject 和 object 都有自己的 relative location feature。例如对于 subject 而言，其特征的计算方式如下:
$l_x=\frac{x_s-x_o}{x_o},l_y=\frac{y_s-y_o}{y_o},l_w=\log\frac{w_s}{w_o},l_h=\log\frac{h_s}{h_o} \tag{1}$