论文阅读：Visual Relationship Detection with Language Priors

最新推荐文章于 2024-08-09 15:32:31 发布

Kivee123

最新推荐文章于 2024-08-09 15:32:31 发布

阅读量2.7k

点赞数 2

文章标签： scene understand

本文链接：https://blog.csdn.net/qq_37014750/article/details/84287301

版权

本文探讨了利用语言先验进行视觉关系检测的方法，通过独立训练物体和关系模型，结合语言模型的语义关联，解决长尾现象，提升检测性能。通过优化目标函数，使模型更准确预测常见及罕见的关系。实验表明，该方法能有效提升关系检测的召回率。

摘要由CSDN通过智能技术生成

Visual Relationship Detection with Language Priors(ECCV2016)

文章
尽管大多数的relationship并不常见，但是它们的object和predicate却更频繁地独立出现。paper用这个insight分别独立训练训练object和predicate的模型，然后再进行组合来预测relationship。Visual relationship detection的一个基础挑战在于要从很少的样本进行学习。
paper的另一个发现是，relationship之间有semantic的关联。比如person riding a horse和person riding an elephant在语义上式相似的，因为horse和elephant都是animal，即使模型没有见过很多person riding an elephant，也可以从person riding a horse进行推断。
一方面，本文的方法会学习object和predicate的外观模型；另一方面，会使用从language学习到的relationship embedding space。
visual relation detection中的long-tail现象，只有很少一部分relationship是频繁出现的，许多不常出现的relationship构成了long tail。
学习visual phrase模型对于object个体的检测是有帮助的，比如检测a person riding a horse有助于提高person和horse的检测和定位。
整体的框架如下