GraphSAGE研究意义:
1. 图卷积神经网络最常用的几个模型(GCN、GAT、GraphSAGE)
2、归纳式学习(inductive learning)
3、不同于之前的学习node embedding,提出学习aggregators等函数的方式
4、探讨了多种aggregator方式(mean、pool、lstm)
5、图表征学习的经典baseline
论文主要结构:
一、摘要Abstract
介绍图的广泛应用,主要引出本文的motivations是做图的归纳式学习,通过学习一组函数对节点的邻居采样,然后汇聚得到向量式表达,具体可以总结为以下几点:
1、提出一种归纳式学习模型,可以得到新点/新图的表征
2、GraphSAGE模型通过学习一组函数来得到点的特征
3、采样并汇聚点的邻居特征与节点的特征拼接得到点的特征
4、GraphSAGE算法在直推式和归纳式学习均达到最优结果
二、Introduction
介绍了图的广泛应用,介绍之前的工作主要是基于静态图的算法,GraphSAGE处理新点甚至新图,总结了DeepWalk、Node2vec、GCN等算法,提出本文算法主要是训练aggregate函数
三、Related Work
介绍之前的算法,基于随机游走、矩阵分解、图卷积等算法
四、GraphSAGE模型
主要介绍前向传播算法、模型参数介绍、aggregator模型结构
GraphSAGE算法如上图Algorithm1,主要的部分就是归纳也就是(4)、(5)两部分,所有邻居信息汇聚,以及自身信息和邻居信息合并计算
接着,文章又介绍了目标函数(如上图3.2),不仅可以进行有监督学习,还可以进行无监督学习,无监督学习的目标函数和之前的图算法目标函数一致,说的就是图结构中,两个节点关系比较紧密,那么学出来的两个节点的embedding也比较相似
之后介绍了aggregate函数的几种方式,包括Mean、LSTM、Pooling,论文附录中还给出批量学习的算法
五、Experiments
实验设置、数据集选择、直推式学习实验、参数分析、不同aggregate函数对模型的影响分析
主要介绍了一些实验参数以及对·实验数据集的介绍,最后实验结果对比
六、Theoretical Analysis && Conclusion
总结提出的GraphSAGE模型具有归纳式的能力,邻居汇聚时考虑不同的aggregator方式,讨论了几种未来方向和subgraph embedding 邻居采样方式等
创新点:
1、归纳式学习(inductive learning)
2、多种aggregators探讨
3、文中并给出一些理论分析
关键点:
1、模型结构
2、邻居节点的sampling
3、Batch训练方式
启发点:
1、归纳式学习方式
2、多种aggregate函数讨论
3、Batch 训练方式 sample 邻居性能高效
4、GCN、GAT、GraphSAGE经典的baselines
七、Coding
论文中的数据集-cora
数据集主要包含两个文件,
一个是cora.cites表示两个节点节点是否有边
另一个是cora.content 表示每个节点的特征以及label
example:
cora.cites
35 1033
35 103482
35 103515
35 1050679
35 1103960
35 1103985
35 1109199
35 1112911
35 1113438
35 1113831
35 1114331
35 1117476
35 1119505
35 1119708
35 1120431
35 1123756
35 1125386
35 1127430
35 1127913
.....
cora.content
31336 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 Neural_Networks
......
""" 加载数据并对数据进行处理 """
def load_cora():
import numpy as np
num_nodes = 2708
num_feats = 1433
feat_data = np.zeros((num_nodes, num_feats))
labels = np.empty((num_nodes, 1), dtype=np.int64)
node_map = {}
label_map = {}
with open('../cora/cora.content') as fp:
for i,line in enumerate(fp):
info = line.strip().split()
tmp = []
for ss in info[1:-1]:
tmp.append(float(ss))
feat_data[i,:] = tmp
node_map[info[0]] = i
if not info[-1] in label_map:
label_map[info[-1]] = len(label_map)
labels[i] = label_map[info[-1]]
from collections import defaultdict
adj_lists = defaultdict(set)
with open('../cora/cora.cites') as fp:
for i,line in enumerate(fp):
info = line.strip().split()
uid = node_map[info[0]]
target_uid = node_map[info[1]]
adj_lists[uid].add(target_uid)
adj_lists[target_uid].add(uid)
return feat_data,labels,adj_lists
""" 构建aggregate 函数"""
import torch
import torch.nn as nn
from torch.autograd import Variable
import random
class MeanAggregator(nn.Module):
def __init__(self,features,cuda=False,gcn=False):
super(MeanAggregator,self).__init__()
self.features = features
self.cuda = cuda
self.gcn = gcn
def forward(self,nodes,to_neighs,num_sample=10):
_set = set
if not num_sample is None:
_sample = random.sample
samp_neighs = [_set(_sample(to_neigh, num_sample)) if len(to_neigh) >= num_sample else to_neigh for to_neigh in to_neighs]
else:
sample_neighs = to_neighs
if self.gcn:
sample_neighs = [samp_neigh + set([nodes[i]]) for i,samp_neigh in enumerate(samp_neighs)]
unique_nodes_list = list(set.union(*samp_neighs))
unique_nodes = {n:i for i,n in enumerate(unique_nodes_list)}
mask = Variable(torch.zeros(len(samp_neighs),len(unique_nodes)))
column_indices = [unique_nodes[n] for samp_neigh in samp_neighs for n in samp_neigh]
row_indices = [i for i in range(len(samp_neighs)) for j in range(len(samp_neighs[i]))]
mask[row_indices,column_indices] = 1
if self.cuda:
mask = mask.cuda()
num_neigh = mask.sum(1,keepdim=True)
mask = mask.div(num_neigh)
if self.cuda:
embed_matrix = self.features(torch.LongTensor(unique_nodes_list).cuda())
else:
embed_matrix = self.features(torch.LongTensor(unique_nodes_list))
to_feats = mask.mm(embed_matrix)
return to_feats
""" 自身节点和邻居节点进行聚合 """
import torch
import torch.nn as nn
from torch.nn import init
import torch.nn.functional as F
class Encoder(nn.Module):
"""
Encodes a node's using 'convolutional' GraphSage approach
"""
def __init__(self, features, feature_dim,
embed_dim, adj_lists, aggregator,
num_sample=10,
base_model=None, gcn=False, cuda=False,
feature_transform=False):
super(Encoder, self).__init__()
self.features = features
# 变换前的hidden_size/维度
self.feat_dim = feature_dim
self.adj_lists = adj_lists
# 即邻居聚合后的mebedding
self.aggregator = aggregator
self.num_sample = num_sample
if base_model != None:
self.base_model = base_model
self.gcn = gcn
# 变换后的hidden_size/维度
self.embed_dim = embed_dim
self.cuda = cuda
self.aggregator.cuda = cuda
# 矩阵W维度 = 变换后维度 * 变换前维度
# 其中gcn表示是否拼接,如果拼接的话由于是"自身向量||邻居聚合向量", 所以维度为2倍
self.weight = nn.Parameter(
torch.FloatTensor(embed_dim, self.feat_dim if self.gcn else 2 * self.feat_dim))
init.xavier_uniform(self.weight)
def forward(self, nodes):
"""
Generates embeddings for a batch of nodes.
nodes -- list of nodes
"""
neigh_feats = self.aggregator.forward(nodes, [self.adj_lists[int(node)] for node in nodes],
self.num_sample)
if not self.gcn:
if self.cuda:
self_feats = self.features(torch.LongTensor(nodes).cuda())
else:
self_feats = self.features(torch.LongTensor(nodes))
# 将自身和聚合邻居的向量拼接, algorithm 1 line 5的拼接部分
combined = torch.cat([self_feats, neigh_feats], dim=1)
else:
# 只用聚合邻居的向量来表示,不用自身信息, algorithm 1 line 5的拼接部分
combined = neigh_feats
# 送入到神经网络,algorithm 1 line 5乘以矩阵W
combined = F.relu(self.weight.mm(combined.t()))
# 经过一层GNN layer后的点的embedding,维度为embed_dim * nodes
return combined
""" 定义整体结构 """
class SupervisedGraphSage(nn.Module):
def __init__(self, num_classes, enc):
super(SupervisedGraphSage, self).__init__()
# 这里面赋值为enc2(经过两层GNN)
self.enc = enc
self.xent = nn.CrossEntropyLoss()
# 全连接参数矩阵,映射到labels num_classes维度做分类
self.weight = nn.Parameter(torch.FloatTensor(num_classes, enc.embed_dim))
init.xavier_uniform(self.weight)
def forward(self, nodes):
# embeds实际是我们两层GNN后的输出nodes embedding
embeds = self.enc(nodes)
# 最后将nodes * hidden size 映射到 nodes * num_classes(= 7)之后做softmax计算cross entropy
scores = self.weight.mm(embeds)
return scores.t()
def loss(self, nodes, labels):
# 钱箱传播
scores = self.forward(nodes)
# 定义的cross entropy
return self.xent(scores, labels.squeeze())
""" 训练模型 """
def run_cora():
# 随机数设置seed(种子)
np.random.seed(1)
random.seed(1)
# cora数据集点数
num_nodes = 2708
# 加载cora数据集, 分别是
# feat_data: 特征
# labels: 标签
# adj_lists: 邻接表,dict (key: node, value: neighbors set)
feat_data, labels, adj_lists = load_cora()
# 设置输入的input features矩阵X的维度 = 点的数量 * 特征维度
features = nn.Embedding(2708, 1433)
# 为矩阵X赋值,参数不更新
features.weight = nn.Parameter(torch.FloatTensor(feat_data), requires_grad=False)
# features.cuda()
# 一共两层GNN layer
# 第一层GNN
# 以mean的方式聚合邻居, algorithm 1 line 4
agg1 = MeanAggregator(features, cuda=True)
# 将自身和聚合邻居的向量拼接后送入到神经网络(可选是否只用聚合邻居的信息来表示), algorithm 1 line 5
enc1 = Encoder(features, 1433, 128, adj_lists, agg1, gcn=True, cuda=False)
# 第二层GNN
# 将第一层的GNN输出作为输入传进去
# 这里面.t()表示转置,是因为Encoder class的输出维度为embed_dim * nodes
agg2 = MeanAggregator(lambda nodes : enc1(nodes).t(), cuda=False)
# enc1.embed_dim = 128, 变换后的维度还是128
enc2 = Encoder(lambda nodes : enc1(nodes).t(), enc1.embed_dim, 128, adj_lists, agg2,
base_model=enc1, gcn=True, cuda=False)
# 采样的邻居点的数量
enc1.num_samples = 5
enc2.num_samples = 5
# 7分类问题
# enc2是经过两层GNN layer时候得到的 node embedding/features
graphsage = SupervisedGraphSage(7, enc2)
# graphsage.cuda()
# 目的是打乱节点顺序
rand_indices = np.random.permutation(num_nodes)
# 划分测试集、验证集、训练集
test = rand_indices[:1000]
val = rand_indices[1000:1500]
train = list(rand_indices[1500:])
# 用SGD的优化,设置学习率
optimizer = torch.optim.SGD(filter(lambda p : p.requires_grad, graphsage.parameters()), lr=0.7)
# 记录每个batch训练时间
times = []
# 共训练100个batch
for batch in range(100):
# 取256个nodes作为一个batch
batch_nodes = train[:256]
# 打乱训练集的顺序,使下次迭代batch随机
random.shuffle(train)
# 记录开始时间
start_time = time.time()
optimizer.zero_grad()
# 这个是SupervisedGraphSage里面定义的cross entropy loss
loss = graphsage.loss(batch_nodes,
Variable(torch.LongTensor(labels[np.array(batch_nodes)])))
# 反向传播和更新参数
loss.backward()
optimizer.step()
# 记录结束时间
end_time = time.time()
times.append(end_time-start_time)
# print (batch, loss.data[0])
print (batch, loss.data)
# 做validation
val_output = graphsage.forward(val)
# 计算micro F1 score
print ("Validation F1:", f1_score(labels[val], val_output.data.numpy().argmax(axis=1), average="micro"))
# 计算每个batch的平均训练时间
print ("Average batch time:", np.mean(times))
""" 模型运行结果 """
run_cora()
0 tensor(1.9649)
1 tensor(1.9406)
2 tensor(1.9115)
3 tensor(1.8925)
4 tensor(1.8731)
5 tensor(1.8354)
6 tensor(1.8018)
7 tensor(1.7535)
8 tensor(1.6938)
9 tensor(1.6029)
10 tensor(1.6312)
11 tensor(1.5248)
12 tensor(1.4800)
13 tensor(1.4503)
14 tensor(1.4162)
15 tensor(1.3210)
16 tensor(1.2243)
17 tensor(1.2255)
18 tensor(1.0978)
19 tensor(1.1330)
20 tensor(0.9534)
21 tensor(0.9112)
22 tensor(0.9170)
23 tensor(0.7924)
24 tensor(0.8008)
25 tensor(0.7142)
26 tensor(0.7839)
27 tensor(0.8878)
28 tensor(1.2177)
29 tensor(0.9943)
30 tensor(0.8073)
31 tensor(0.6588)
32 tensor(0.6254)
33 tensor(0.5622)
34 tensor(0.5158)
35 tensor(0.4763)
36 tensor(0.5298)
37 tensor(0.5419)
38 tensor(0.5098)
39 tensor(0.4122)
40 tensor(0.4262)
41 tensor(0.4451)
42 tensor(0.4126)
43 tensor(0.4409)
44 tensor(0.3913)
45 tensor(0.4496)
46 tensor(0.4365)
47 tensor(0.4601)
48 tensor(0.4714)
49 tensor(0.4090)
50 tensor(0.4145)
51 tensor(0.3428)
52 tensor(0.3454)
53 tensor(0.3531)
54 tensor(0.3131)
55 tensor(0.2719)
56 tensor(0.3519)
57 tensor(0.3286)
58 tensor(0.3125)
59 tensor(0.2529)
60 tensor(0.3033)
61 tensor(0.2332)
62 tensor(0.3049)
63 tensor(0.3026)
64 tensor(0.3770)
65 tensor(0.3811)
66 tensor(0.3223)
67 tensor(0.2450)
68 tensor(0.2620)
69 tensor(0.2846)
70 tensor(0.2482)
71 tensor(0.3044)
72 tensor(0.4133)
73 tensor(0.3156)
74 tensor(0.4421)
75 tensor(0.2596)
76 tensor(0.2585)
77 tensor(0.2639)
78 tensor(0.2035)
79 tensor(0.2328)
80 tensor(0.1748)
81 tensor(0.1730)
82 tensor(0.1978)
83 tensor(0.1614)
84 tensor(0.1890)
85 tensor(0.1227)
86 tensor(0.1568)
87 tensor(0.1527)
88 tensor(0.2365)
89 tensor(0.2297)
90 tensor(0.1787)
91 tensor(0.1920)
92 tensor(0.1864)
93 tensor(0.1254)
94 tensor(0.1678)
95 tensor(0.1336)
96 tensor(0.1562)
97 tensor(0.2531)
98 tensor(0.2392)
99 tensor(0.2089)
Validation F1: 0.864
Average batch time: 0.047979302406311035