数据类型
PyTorch Geometric定义了自己的数据类型。
节点和节点之间的边构成了图。在PyTorch Geometric中,如果要构建图,那么需要两个要素:节点和边。PyTorch Geometric 提供了torch_geometric.data.Data
用于构建图,包括 5 个属性,每一个属性都不是必须的,可以为空。
data.x
: 用于存储每个节点的特征,形状是[num_nodes, num_node_features]
。data.edge_index
: 用于存储节点之间的边,形状是[2, num_edges]
。data.pos
: 存储节点的坐标,形状是[num_nodes, num_dimensions]
。data.y
: 存储样本标签。如果是每个节点都有标签,那么形状是[num_nodes, *]
;如果是整张图只有一个标签,那么形状是[1, *]
。data.edge_attr
: 存储边的特征。形状是[num_edges, num_edge_features]
。
实际上,Data
对象不仅仅限制于这些属性,我们可以通过data.face
来扩展Data
,以张量保存三维网格中三角形的连接关系。
还可以添加其他的属性,如下所示
data = Data(x=x, edge_index=edge_index)
data.train_idx = torch.tensor([...], dtype=torch.long)
data.train_mask = torch.tensor([...], dtype=torch.bool)
data.test_mask = torch.tensor([...], dtype=torch.bool)
简单案例
创建一个图
用networkx包创建一个图,然后用torch转换为Data
对象。注意节点编号要从0开始,索引须为整形。
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import community as community_louvain
# build a graph
G = nx.Graph()
edgelist = [(0, 1), (0, 2), (1, 3)] # note that the order of edges
G.add_edges_from(edgelist)
# plot the graph
fig, ax = plt.subplots(figsize=(4,4))
option = {'font_family':'serif', 'font_size':'15', 'font_weight':'semibold'}
nx.draw_networkx(G, node_size=400, **option) #pos=nx.spring_layout(G)
plt.axis('off')
plt.show()
图如下
创建Data示例
利用networkx图数据创建Data
对象。
import torch
from torch_geometric.data import InMemoryDataset, Data
x = torch.eye(G.number_of_nodes(), dtype=torch.float)
adj = nx.to_scipy_sparse_matrix(G).tocoo()
row = torch.from_numpy(adj.row.astype(np.int64)).to(torch.long)
col = torch.from_numpy(adj.col.astype(np.int64)).to(torch.long)
edge_index = torch.stack([row, col], dim=0)
# Compute communities.
partition = community_louvain.best_partition(G)
y = torch.tensor([partition[i] for i in range(G.number_of_nodes())])
# Select a single training node for each community
# (we just use the first one).
train_mask = torch.zeros(y.size(0), dtype=torch.bool)
for i in range(int(y.max()) + 1):
train_mask[(y == i).nonzero(as_tuple=False)[0]] = True
data = Data(x=x, edge_index=edge_index, y=y, train_mask=train_mask)
依次查看每个变量的值
上面
- x是节点特征矩阵,这里设为单位矩阵。
- adj是图G的邻接矩阵的稀疏表示,左边节点对代表一条边,右边是边的值,adj是对称矩阵。
- row和col分别是adj中非零元素所在的行索引以及列索引。
- edge_index就是PyTorch Geometric中边列表的表示形式,里面包含两个列表,第一个是row,第二个是col,row和col对应位置的元素就构成一条边。注意edge_index是可以表示边方向的,如果是无向图,则一条边会出现两次,比如
(0, 1)
和(1, 0)
是指一条边,如果是在有向图中,它们就表示两条不同的边。 - partition是用louvain算法对图G进行社区划分后的结果,可以看到0和2属于一个社区,1和3属于另一个社区。
- y就是节点的社区标签。
- train_mask是训练集的标签,用于半监督节点分类任务,每类节点中只有一个节点的标签设置为已知
True
,其他为未知False
。
自带函数
接下来,我们可以看看data
示例本身自带哪些函数。
上面可以看到data
示例本身可以调用很多函数,比如查看数据键名、数据键值、节点数量、边数量、节点特征数量、是否有孤立节点、是否有自环、有向或者无向等等。
添加属性
试试添加其他属性。
从上面例子看到可以继续添加test_mask
属性,设置某些节点为测试集。
节点分类
利用上面这个简单的图实现节点分类任务,类别就是上面louvain算法给出的社区类别。训练数据为节点0和1,测试数据为节点2和3。
构建一个图卷积神经网络,包含两个卷积层,第一层输入维度为4,输出维度为16;第二层输入维度为16,输出维度为2;第一层后面接上一个激活函数,并进行dropout操作。
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = GCNConv(data.num_node_features, 16)
self.conv2 = GCNConv(16, 2)
def forward(self):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
指定优化器,训练函数和测试函数的代码如下:
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
model, data = Net().to(device), data.to(device)
optimizer = torch.optim.Adam([
dict(params=model.conv1.parameters(), weight_decay=5e-4),
dict(params=model.conv2.parameters(), weight_decay=0)
], lr=0.01) # Only perform weight-decay on first convolution.
def train():
optimizer.zero_grad()
out = model()
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
def test():
model.eval()
logits, accs = model(), []
for _, mask in data('train_mask', 'test_mask'):
pred = logits[mask].max(1)[1]
acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item()
accs.append(acc)
return accs
训练十次,输出训练集和测试集上的结果:
for epoch in range(1, 11):
train()
log = 'Epoch: {:03d}, Train: {:.4f}, Test: {:.4f}'
print(log.format(epoch, *test()))
输出如下:
Epoch: 001, Train: 0.5000, Test: 0.5000
Epoch: 002, Train: 0.5000, Test: 0.5000
Epoch: 003, Train: 0.5000, Test: 1.0000
Epoch: 004, Train: 1.0000, Test: 1.0000
Epoch: 005, Train: 1.0000, Test: 1.0000
Epoch: 006, Train: 1.0000, Test: 1.0000
Epoch: 007, Train: 1.0000, Test: 1.0000
Epoch: 008, Train: 1.0000, Test: 1.0000
Epoch: 009, Train: 1.0000, Test: 1.0000
Epoch: 010, Train: 1.0000, Test: 1.0000
可以看到在这个简单的网络上,使用图卷积神经网络训练四次以后结果就收敛了,分类很准确。
完整代码
完整代码如下:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import community as community_louvain
import torch
from torch_geometric.data import InMemoryDataset, Data
# build a graph
G = nx.Graph()
edgelist = [(0, 1), (0, 2), (1, 3)] # note that the order of edges
G.add_edges_from(edgelist)
x = torch.eye(G.number_of_nodes(), dtype=torch.float)
adj = nx.to_scipy_sparse_matrix(G).tocoo()
row = torch.from_numpy(adj.row.astype(np.int64)).to(torch.long)
col = torch.from_numpy(adj.col.astype(np.int64)).to(torch.long)
edge_index = torch.stack([row, col], dim=0)
# Compute communities.
partition = community_louvain.best_partition(G)
y = torch.tensor([partition[i] for i in range(G.number_of_nodes())])
# Select a single training node for each community
# (we just use the first one).
train_mask = torch.zeros(y.size(0), dtype=torch.bool)
for i in range(int(y.max()) + 1):
train_mask[(y == i).nonzero(as_tuple=False)[0]] = True
data = Data(x=x, edge_index=edge_index, y=y, train_mask=train_mask)
remaining = (~data.train_mask).nonzero(as_tuple=False).view(-1)
remaining = remaining[torch.randperm(remaining.size(0))]
data.test_mask = torch.zeros(y.size(0), dtype=torch.bool)
data.test_mask.fill_(False)
data.test_mask[remaining[:]] = True
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = GCNConv(data.num_node_features, 16)
self.conv2 = GCNConv(16, 2)
def forward(self):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
model, data = Net().to(device), data.to(device)
optimizer = torch.optim.Adam([
dict(params=model.conv1.parameters(), weight_decay=5e-4),
dict(params=model.conv2.parameters(), weight_decay=0)
], lr=0.01) # Only perform weight-decay on first convolution.
def train():
optimizer.zero_grad()
out = model()
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
def test():
model.eval()
logits, accs = model(), []
for _, mask in data('train_mask', 'test_mask'):
pred = logits[mask].max(1)[1]
acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item()
accs.append(acc)
return accs
for epoch in range(1, 11):
train()
log = 'Epoch: {:03d}, Train: {:.4f}, Test: {:.4f}'
print(log.format(epoch, *test()))
相关文章