[1] Li Y, Ouyang W, Zhou B等. Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11205 LNCS: 346–363.
摘要
- 之前的图谱构建方法复杂 计算量大 依赖外部数据 。 限制了应用
- 本文专注于效率提升
- 提出了一种基于子图的连接图,在推理的过程中简洁的表示场景图谱
方法简介
- 子图聚类 挑选代表 用较少的子图代表全局
- SMP :利用子图维护的空间信息 SRI : 空间敏感关系推理 促进关系识别
- SOTA
代码地址 : 代码链接
引言
进来的构建图像内对象之间关系的方法
[6] Dai, B, Zhang, Y, Lin, D. Detecting visual relationships with deep relational networks. CVPR(2017)
[28] Krishna. R. Zhu. Y. groth. O. Johnson. . Hata. K. Kravitz. J. Chen. S Kalantidis, Y, Li, L.J., Shamma, D A, et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations. IJCV(2017)
[34] Li, Y., Ouyang, W., Wang, X., Tang, X.: Vip-cnn: Visual phrase guided convolutional neural network. CVPR (2017)
[35] Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: ICCV (2017)
[37] Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: ECCV (2016)
[58] Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. CVPR (2017)
[64] Zhuang, B., Liu, L., Shen, C., Reid, I.: Towards context-aware interaction recognition. ICCV (2017)
场景图谱用于图像检索
[26] Johnson, J., Krishna, R., Stark, M., Li, L.J., Shamma, D.A., Bernstein, M.S., FeiFei, L.: Image retrieval using scene graphs. In: CVPR (2015)
[45] Ramanathan, V., Li, C., Deng, J., Han, W., Li, Z., Gu, K., Song, Y., Bengio,S., Rossenberg, C., Fei-Fei, L.: Learning semantic relationships for better actionretrieval in images. In: CVPR (2015)
生成场景图谱的两种方法
1. 两步法
先探测对象 然后建立连接
[6] Dai, B, Zhang, Y, Lin, D. Detecting visual relationships with deep relational networks. CVPR(2017)
[36] Liao, W., Shuai, L., Rosenhahn, B., Yang, M.Y.: Natural language guided visual relationship detection. arXiv preprint arXiv:1711.06032 (2017)
[37] Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: ECCV (2016)
[58] Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. CVPR (2017)
[62] Yu, R., Li, A., Morariu, V.I., Davis, L.S.: Visual relationship detection with internal and external linguistic knowledge distillation. ICCV (2017)
2. 协同推理
jointly infer the objects and their relationships [34,35,58] based on the object region proposals.
基于region proposals 的协同推理对象之间的关系
[34]:Li, Y., Ouyang, W., Wang, X., Tang, X.: Vip-cnn: Visual phrase guided convolutional neural network. CVPR (2017)
[35]:Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: ICCV (2017)
[58]:Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. CVPR (2017)
为了生成完整的场景图谱,两种方法都应该被考虑到。利用共视区域的特征。
但是当数据量变得大的时候,这个问题就变得很棘手 。利用更少的对象或者利用一些简单的过滤方法过滤掉一些pairs是一种方案,但是这两种方法都会降低模型的表现。
关键点: 找到更简洁的场景图谱的中间表示应该是解决问题的关键
方法核心思想
子图聚类,共享文本表示。显著提高了效率。
Spatial-weighted message passing SMP 用于保持子图之间的空间信息。
空间信息在谓词识别中已经证明很有作用。
- Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. CVPR (2017)
- Liao, W., Shuai, L., Rosenhahn, B., Yang, M.Y.: Natural language guided visual relationship detection. arXiv preprint arXiv:1711.06032 (2017)
Yu, R., Li, A., Morariu, V.I., Davis, L.S.: Visual relationship detection with internal and external linguistic knowledge distillation. ICCV (2017)
为了利用空间信息,Spatial-sensitive Relation Inference SRI 算法被设计。它融合了object features pairs 和 subgraph features 用于最后的关系推理。
小结
- 提出了一种有效的基于子图的场景图谱生成方法
它具有以下创新性:
- 自下向上的聚类方法以加速
- SMP 保持空间结构, SRI利用空间信息进行谓词推理