论文阅读笔记-A Survey on Graph Neural Networks and Graph Transformers in Computer Vision(GNN综述)

lzl2040

已于 2022-10-11 20:13:41 修改

阅读量1.8k

点赞数 1

分类专栏：论文阅读文章标签：论文阅读计算机视觉人工智能

于 2022-10-07 14:28:01 首次发布

本文链接：https://blog.csdn.net/qq_41234663/article/details/127193962

版权

论文阅读专栏收录该内容

13 篇文章 2 订阅

订阅专栏

论文阅读笔记-GNN综述

主要介绍了GNN以及它在各个领域的应用

2D NATURAL IMAGES

Image Classification

Multi-Label Classification

ML-GCN:builds a directed graph on the basis of label space, where each node stands for a object label (word embeddings) and their connections model the inter-dependencies of different labels.

attention-driven GCN:model the label dependencies via more elaborate GNN architectures

hypergraph neural networks:model the label dependencies via more elaborate GNN architectures

Few-Shot Learning

论文名称	来源	主要思想
Few-shot learning with graph neural networks	ICLR,2018	formulate FSL as a supervised interpolation problem on a densely-connected graph, where the vertices stand for images in the collection and the adjacency is learnable with trainable similarity kernels.
Learning to propagate labels: Transductive propagation network for few-shot learning	ICLR,2019	constructs graphs on the top of embedding space to fully exploit the manifold structure of the novel classes.Label information is propagated from the support set to the query set based on the constructed graphs
dge-labeling graph neural network for few-shot learning	CVPR,2019	propose a edge-labeling GNN framework that learns to predict edge labels, explicitly constraining the intra- and inter-class similarities.
Learning from the past: Continual meta-learning via bayesian graph modeling	AAAI,2020	formulate meta-learning-based FSL as continual learning of a sequence of tasks and resort to Bayesian GNN to capture the intra- and inter-task correlations.
Dpgn: Distribution propagation graph network for few-shot learning	CVPR,2020	devise a dual complete graph network to model both distribution- and instance-level relations.
Hierarchical graph neural networks for few-shot learning	TCSVT,2021	exploit the hierarchical relationships among graph nodes via the bottom-up and top-down reasoning modules.
Hybrid graph neural networks for few-shot learning	AAAI,2022	introduce an instance GNN and a prototype GNN as feature embedding task adaptation modules for quickly adapting learned features to new tasks.

Zero-Shot Learning (ZSL)

论文名称	来源	主要思想
Rethinking knowledge graph propagation for zero-shot learning	CVPR,2019	propose a Dense Graph Propagation (DGP) module to exploit the hierarchical structure of knowledge graph.It consists of two phases to iteratively propagate knowledge between a node and its ancestors and descendants.
Region graph embedding network for zero-shot learning	ECCV,2020	represent each input image as a region graph, where each node stands for an attended region in the image and the edges are appearance similarities among these region nodes.
Attribute propagation network for graph zero-shot learning	AAAI,2020	generates and updates attribute vectors with an attribute propagation network for optimizing the attribute space
Isometric propagation network for generalized zero-shot learning	ICLR,2021	introduce the visual and semantic prototype propagation on auto-generated graphs to enhance the inter-class relations and align the corresponding classwise dependencies in visual and semantic space
Learning graph embeddings for open world compositional zero-shot learning	TPAMI, 2022	introducing a Compositional Cosine Graph Embedding (Co-CGE) model to learn the relationship between primitives and compositions through a GCN.They quantitatively measure the feasibility scores of a state-object composition and incorporate the computed scores into CoCGE in two ways
Gndan: Graph navigated dual attention network for zero-shot learning	IEEE TNNLS, 2022	resort to GAT for exploiting the appearance relations between local regions and the cooperation between local and global features.

Transfer Learning

论文名称	来源	主要思想
Gcan: Graph convolutional adversarial network for unsupervised domain adaptation	CVPR，2019	propose a Graph Convolutional Adversarial Network (GCAN) for DA, where a GCN is developed on top of densely-connected instance graphs to encode data structure information.
Heterogeneous graph attention network for unsupervised multiple-target domain adaptation	IEEE TPAMI, 2020	build a heterogeneous relation graph and introduce GAT to propagate the semantic information and generate reliable pseudo-labels.
Curriculum graph co-teaching for multi-target domain adaptation	CVPR,2021	introduce a GCN to aggregate information from different domains along with a co-teaching and curriculum learning strategy to achieve progressive adaptation.
Progressive graph learning for open-set domain adaptation	ICML,2020	study the problem of open-set DA via a progressive graph learning framework to select pseudo-labels and thus avoid the negative transfer.
Prototype-matching graph network for heterogeneous domain adaptation	ACMMM 2020	attain cross-domain prototype alignment based on features learned from different stages of GNNs.
Learning to combine: Knowledge aggregation for multi-source domain adaptation	ECCV. Springer, 2020.	introduce a knowledge graph based on the prototypes of different domains to perform information propagation among semantically adjacent representations.
Compound domain generalization via meta-knowledge encoding	CVPR,2022	build global prototypical relation graphs and introduce a graph self-attention mechanism

当前工作重点

Current work focuses on extracting adhoc knowledge graphs from the data for a certain task, which is heuristic and relies on the human prior

未来的方向

（1）develop general and automatic graph construction procedures,

（2）enhance the interactions between abstract graph structures and task-specific classifiers

（3）excavate more fine-grained building blocks (node and edge) to increase the capability of constructed graphs.

Object Detection

论文名称	来源	主要思想
Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection	CVPR,2019	presents an adaptive global reasoning network for large-scale object detection by incorporating commonsense knowledge (category-wise knowledge graph) and propagating visual information globally
Spatial-aware graph relation network for large-scale object detection	CVPR,2019	adaptively discover semantic and spatial relationships without requiring prior handcrafted linguistic knowledge
Relation networks for object detection	CVPR,2018	introduces an adapted attention module to detection head networks, explicitly learning information between objects through encoding the longrange dependencies.
Relationnet++: Bridging visual representations for object detection via transformer decoder	NeurIPS,2020	presents a selfattention-based decoder module to embrace the strengths of different object/part representations within a single detection framework.
Gar: Graph assisted reasoning for object detection	WACV,2020	introduce a heterogeneous graph to jointly model object-object and object-scene relations.
Graphfpn: Graph feature pyramid network for object detection	ICCV,2021	propose a graph feature pyramid network (GraphFPN), which explores the contextual and hierarchical structures of an input image based on a superpixel hierarchy
Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detection	IEEE TPAMI,2022	first builds intra- and inter-domain relation graphs in virtue of cyclic between-domain consistency without any prior knowledge about the target distribution.
Sigma: Semantic-complete graph matching for domain adaptive object detection	ICCV,2021	formulates DAOD as a graph matching problem by establishing cross-image graphs to model classconditional distributions on both domains
Semantic relation reasoning for shot-stable few-shot object detection	CVPR,2022	introduces a semantic relation reasoning module to integrate semantic information between base and novel classes for novel object detection

说明：domain adaptive object detection (DAOD)

当前的工作重点

exploit between-object, cross-scale or cross-domain relationships, as well as relationships between base and novel classes

未来的方向

（1）design better region-to-node feature mapping methods,

（2）incorporate Transformer (or pure GNN) encoders to improve the expressive power of initial node features

（3）directly perform reasoning in the original feature space to better preserve the intrinsic structure of images.

Image Segmentation

一般的分割

论文题目	来源	主要思想
Dual graph convolutional network for semantic segmentation	BMVC,2019	targets on modeling the global context of input features via a dual GCN framework where a coordinate space GCN models spatial relationships between pixels in the image, and a feature space GCN models dependencies along the channel dimensions of the network’s feature map.
Graph-based global reasoning networks	CVPR,2019	design the global reasoning unit by projecting features that are globally aggregated in coordinate space to node domain and performing relational reasoning in a fullyconnected graph.
Dynamic graph message passing networks	CVPR,2020	dynamically samples the neighborhood of a node and then predicts the node dependencies, filter weights, and affinity matrix to attain information propagation
Representative graph neural network	ECCV,2020	propose to dynamically sample some representative nodes for relational modeling.
Spatial pyramid based graph reasoning for semantic segmentation	CVPR,2020	propose an improved Laplacian formulation that enables graph reasoning in the original feature space, fully exploiting the contextual relations at different feature scales.
Class-wise dynamic graph convolution for semantic segmentation	ECCV,2020	introduce a classwise dynamic graph convolution module to conduct graph reasoning over the pixels that belong to the same class
Bidirectional graph reasoning network for panoptic segmentation	CVPR,2020	design a bidirectional graph reasoning network to bridge the things branch and the stuff branch for panoptic segmentation.

One-Shot Semantic Segmentation

论文题目	来源	主要思想
Pyramid graph networks with connection attentions for region-based oneshot semantic segmentation	ICCV,2019	introduce a pyramid graph attention module to model the connection between query and support feature maps

Few-Shot Semantic Segmentation

论文题目	来源	主要思想
Scale-aware graph neural network for few-shot semantic segmentation	CVPR,2021	propose a scale-aware GNN to perform crossscale relational reasoning among support-query images. A self-node collaboration mechanism is introduced to perceive different resolutions of the same object.

Weakly Supervised Semantic Segmentation

论文题目	来源	主要思想
Affinity attention graph neural network for weakly supervised semantic segmentation	IEEE,TPAMI 2021	an image will first be converted to a weighted graph via an affinity CNN network, and then an affinity attention layer is devised to obtain long-range interactions from the constructed graph and propagate semantic information to the unlabeled pixels

当前的工作重点

explore contextual information in the localor global-level with pyramid pooling, dilated convolutions, or the self-attention mechanism

Scene Graph Generation (SGG)

任务概述：检测图像中的对象对及其关系以生成可视化的场景图的任务，它提供了对视觉场景的高级理解，而不是孤立地处理单个对象

论文题目	来源	主要思想
Factorizable net: an efficient subgraph-based framework for scene graph generation	ECCV,2018	a subgraph-based approach (each subgraph is regarded as a node), has a spatially weighted message passing structure to refine the features of objects and subgroups by passing messages among them with attention-like schemes
Graph r-cnn for scene graph generation	ECCV,2018	first obtain a sparse candidate graph by pruning the densely-connected graph generated from RPN via a relation proposal network, then an attentional GCN is introduced to aggregate contextual information and update node features and edge relationships
Attentive relational networks for mapping images to scene graphs	CVPR,2019	propose attentive relational networks, which first transform label word embeddings and visual features into a shared semantic space, and then rely on GAT to perform feature aggregation for final relation inference
Bipartite graph network with adaptive message passing for unbiased scene graph generation	CVPR,2021	introduce bipartite GNN to estimate and propagate relation confidence in a multi-stage manner.
Energy-based learning for scene graph generation	CVPR,2021	propose an energybased framework, which depends on graph message passing algorithm for computing the energy of configurations.

VIDEO UNDERSTANDING

Video Action Recognition

任务介绍：视频人体动作识别是视频处理和理解的基本任务之一，其目的是识别和分类RGB/深度视频或骨架数据中的人体动作。

Action Recognition

论文题目	来源	主要思想
		propose to capture the long-range temporal contexts via graph-based reasoning over human-object and object-object relationships
		construct actor-centric object-level graph and applying GCNs to capture the contexts among objects in a actor-centric way.A relation-level graph is built to inference the contexts in relation nodes
		propose multi-scale reasoning in the temporal graph of a video, in which each node is a frame in the video, and the pairwise relations between nodes are represented as a learnable adjacent matrix
		extend the GCN-based relation modeling to zero-shot action recognition and leverage knowledge graphs to model the relations among actions and attributes jointly
		introduce a graph-based high-order relation modeling method for long-term action recognition.

Skeleton-Based Action Recognition.

论文题目	来源	主要思想
		propose a STGCN network first connects joints in a frame according to the natural connectivity in the human body and then connects the same joints in two consecutive frames to maintain temporal information.
		introduce a fully-connected graph with learnable edge weights between joints and a data-dependent graph learned from the input skeleton.
		connect physically-apart skeleton joints to captures the patterns of collaborative moving joints
		improves the joints’ connection in a single frame by adding edges between limbs and head.it uses GCNs to capture joints’ relations in single frames and adopt the LSTM to capture the temporal dynamics.
		introduce to maintain edge features and learn both node and edge feature representations via directed graph convolution.
		first construct multiple dilated windows over temporal dimension.Then separately utilize GCNs on multiple graphs with different scales.Finally aggregate the results of GCNs on all the graphs in multiple windows to capture multi-scale and long-range dependencies.