图卷积 节点分类
This article goes through the implementation of Graph Convolution Networks (GCN) using Spektral API, which is a Python library for graph deep learning based on Tensorflow 2. We are going to perform Semi-Supervised Node Classification using CORA dataset, similar to the work presented in the original GCN paper by Thomas Kipf and Max Welling (2017).
本文介绍了使用 Spektral API 实现图卷积网络(GCN)的情况 ,这是一个基于Tensorflow 2的用于图深度学习的Python库。我们将使用CORA数据集执行半监督节点分类,与所介绍的工作类似在 Thomas Kipf和Max Welling(2017) 的原始GCN论文中 。
If you want to get basic understanding on Graph Convolutional Networks, it is recommended to read the first and the second parts of this series beforehand.
数据集概述 (Dataset Overview)
CORA citation network dataset consists of 2708 nodes, where each node represents a document or a technical paper. The node features are bag-of-words representation that indicates the presence of a word in the document. The vocabulary — hence, also the node features — contains 1433 words.
CORA引用网络数据集由2708个节点组成,其中每个节点代表一个文档或技术论文。 节点特征是词袋表示,指示文档中单词的存在。 词汇表-因此,还有节点特征-包含1433个单词。
We will treat the dataset as an undirected graph where the edge represents whether one document cites the other or vice versa. There is no edge feature in this dataset. The goal of this task is to classify the nodes (or the documents) into 7 different classes which correspond to the papers’ research areas. This is a single-label multi-class classification problem with Single Mode data representation setting.
我们将数据集视为无向图 ,其中边表示一个文档引用了另一文档,反之亦然。 该数据集中没有边缘特征。 此任务的目标是将节点(或文档)分类为7种不同的类别,分别对应于论文的研究领域。 这是一个单标签多类别分类问题 单模式数据表示设置。
This implementation is also an example of Transductive Learning, where the neural network sees all data, including the test dataset, during the training. This is contrast to Inductive Learning — which is the typical Supervised Learning — where the test data is kept separate during the training.
此实现方式也是Transductive Learning的示例,在训练过程中,神经网络可以查看所有数据,包括测试数据集。 这与归纳学习(典型的监督学习)相反,归纳学习在训练过程中将测试数据保持独立。