The More You Know: Using Knowledge Graphs for Image Classification 论文总结

往期文章链接目录

Overview

Note: This previous post I wrote might be helpful for reading this paper summary:

This paper investigates the use of structured prior knowledge in the form of knowledge graphs and shows that using this knowledge improves performance on image classification.

It introduce the Graph Search Neural Network (GSNN) as a way of efficiently incorporating large knowledge graphs into a vision classification pipeline, which outperforms standard neural network baselines for multi-label classification.

Intuition

While modern learning-based approaches can recognize some categories with high accuracy, it usually requires thousands of labeled examples for each of these categories. This approach of building large datasets for every concept is unscalable. One way to solve this problem is to use structured knowledge and reasoning (prior knowledge, this is what human usually do but current approaches do not).

For example, when people try to identify the animal shown in the figure, they will first recognize the animal, then recall relevant knowledge, and finally reason about it. With this information, even if we have only seen one or two pictures of this animal, we would be able to classify it. So we hope a model could also have similar reasoning process.

Previous Work

There has been a lot of work in end-to-end learning on graphs or neural network trained on graphs. Most of these approaches either extract features from the graph or they learn a propagation model that transfers evidence between nodes conditional on the type of edge. An example of this is the Gated Graph Neural Network which takes an arbitrary graph as input. Given some initialization specific to the task, it learns how to propagate information and predict the output for every node in the graph.

Previous works are focusing on building and then querying knowledge bases rather than using existing knowledge bases as side information for some vision task.

This work not only uses attribute relationships that appear in our knowledge graphs, but also uses relationships between objects and reasons directly on graphs rather than using object-attribute pairs directly.

Major Contribution

  1. The introduction of the GSNN as a way of incorporating potentially large knowledge graphs into an end-to-end learning system that is computationally feasible for large graphs;

  2. Provide a framework for using noisy knowledge graphs for image classification (In vision problems, graphs encode contextual and common-sense relationships and are significantly larger and noisier);

  3. The ability to explain image classifications by using the propagation model (Interpretability).

Graph Search Neural Network (GSNN)

GSNN Explanation

The idea is that rather than performing the recurrent update over all of the nodes of the graph at once, it starts with some initial nodes based on the input and only choose to expand nodes which are useful for the final output. Thus, the model only compute the update steps over a subset of the graph.

Steps in GSNN:

  • Determine Initial Nodes

They determine initial nodes in the graph based on likelihood of the visual concept being present as determined by an object detector or classifier. For their experiments, they use Faster R-CNN for each of the 80 COCO categories. For scores over some chosen threshold, they choose the corresponding nodes in the graph as our initial set of active nodes. Once they have initial nodes, they also add the nodes adjacent to the initial nodes to the active set.

  • Propagation

Given the initial nodes, they want to first propagate the beliefs about the initial nodes to all of the adjacent nodes (propagation network). This process is similar to GGNN.

  • Decide which nodes to expand next

After the first time step, they need a way of deciding which nodes to expand next. Therefore, a per-node scoring function is learned to estimates how “important” that node is. After each propagation step, for every node in the current graph, the model predict an importance score

i v ( t ) = g i ( h v , x v ) i_{v}^{(t)}=g_{i}\left(h_{v}, x_{v}\right) iv(t)=gi(hv,x

  • 18
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值