跟着官方文档学DGL框架第三天——关于图的二三事

本文链接：https://blog.csdn.net/beilizhang/article/details/108507584

本文详细介绍DGLGraph图的创建、操作与特征管理，包括从外部数据源构建图、在GPU上运行图以及处理异构图的方法。涵盖边与节点特征的添加、访问和转换，以及图的保存与加载。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

主要参考：https://docs.dgl.ai/guide/graph-graphs-nodes-edges.html

创建DGLGraph图

之前讲了用dgl.DGLGraph()创建图的几种方式。这里提到用dgl.graph()也可以创建DGLGraph图。

与dgl.DGLGraph((u,v))类似，dgl.graph((u,v))中，u和v分别为头节点列表和尾节点列表，列表对应位置的元素确定一条边。

# edges 0->1, 0->2, 0->3, 1->3
u, v = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
g = dgl.graph((u, v))
print(g) # number of nodes are inferred from the max node IDs in the given edges

输出节点IDs

# Node IDs
print(g.nodes())

输出边，与创建时一样，按照头节点列表和尾节点列表形式。

# Edge end nodes
print(g.edges())

输出边的详细信息，除了头节点列表和尾节点列表，还有边的IDs。

# Edge end nodes and edge IDs
print(g.edges(form='all'))

如果有孤立节点，需要在创建边时显示设置节点数量

# If the node with the largest ID is isolated (meaning no edges),
# then one needs to explicitly set the number of nodes
g = dgl.graph((u, v), num_nodes=8)

也可以将单向图转化为双向图

bg = dgl.to_bidirected(g)
bg.edges()

IDs的存储位数

DGL可以选择32-bit或64-bit整型来存储节点和边的IDs。如果一个图的节点数或边数少于 $2^{31}-1$ 个，则使用32-bit整型以节省内存。
DGL默认使用64-bit整型，可以在创建时手动设置，也可以通过long()，int()修改。

g32 = dgl.graph(edges, idtype=th.int32)  # create a int32 graph
print(g32.idtype)

g64_2 = g32.long()  # convert to int64
print(g64_2.idtype)

g32_2 = g64.int()  # convert to int32
print(g32_2.idtype)

节点和边的特征

1.使用g.ndata[‘x’]可以为节点添加特征或访问节点特征

2.使用g.edata[‘x’]可以为边添加特征或访问边特征

3.只有数值型的才可以作为特征，可以是标量、向量、张量

4.节点特征名称可以与边特征名称重复

5.同一名称的特征只能有同一维度和类型

6.对于加权图，可以将权重作为边特征

从外部数据构建DGLGraph图

1.可以从SciPy稀疏矩阵和NetworkX图创建。

import dgl
import torch as th
import scipy.sparse as sp
spmat = sp.rand(100, 100, density=0.05) # 5% nonzero entries
dgl.from_scipy(spmat)                   # from SciPy

import networkx as nx
nx_g = nx.path_graph(5) # a chain 0-1-2-3-4
dgl.from_networkx(nx_g) # from networkx

注意用nx.path_graph(5)转化而来的图有8条边，这是因为NetworkX图是无向图，而DGLGraph是有向图，一条无向边转为了两条有向边。

如果要避免这种情况，需要用networkx.DiGraph()构建有向图：

nxg = nx.DiGraph([(2, 1), (1, 2), (2, 3), (0, 0)])
dgl.from_networkx(nxg)

2.从磁盘中加载图
1）可以是CSV格式

2）JSON/GML格式

3）DGL Binary格式
使用这两个API，dgl.save_graphs(), dgl.load_graphs()可以实现图的保存和加载

异构图

异构图可以有不同类型的节点和边

异构图的创建

格式为：‘关系：节点元组’。
其中‘关系’具体形式为：[头节点类型，边类型，尾节点类型]；
节点元组具体形式为：([U], [V])，其中U和V分别代表头节点列表和尾节点列表。

import dgl
import torch as th

# Create a heterograph with 3 node types and 3 edges types.
graph_data = {
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
}
g = dgl.heterograph(graph_data)
print(g.ntypes)
# ['disease', 'drug', 'gene']
print(g.etypes)
# ['interacts', 'interacts', 'treats']
print(g.canonical_etypes)
# [('drug', 'interacts', 'drug'),
#  ('drug', 'interacts', 'gene'),
#  ('drug', 'treats', 'disease')]

同构图和二部图都可以当成特殊的异构图：

# A homogeneous graph
dgl.heterograph({('node_type', 'edge_type', 'node_type'): (u, v)})
# A bipartite graph
dgl.heterograph({('source_type', 'edge_type', 'destination_type'): (u, v)})

元图表示异构图的架构（本体）

元图中只有各类型节点以及他们之间的各类型边：

print(g)
# Graph(num_nodes={'disease': 3, 'drug': 3, 'gene': 4},
#       num_edges={('drug', 'interacts', 'drug'): 2,
#                  ('drug', 'interacts', 'gene'): 2,
#                  ('drug', 'treats', 'disease'): 1},
#       metagraph=[('drug', 'drug', 'interacts'),
#                  ('drug', 'gene', 'interacts'),
#                  ('drug', 'disease', 'treats')])
print(g.metagraph().edges())
# OutMultiEdgeDataView([('drug', 'drug'), ('drug', 'gene'), ('drug', 'disease')])

操作多种类型

1.在访问节点和边时，需要明确节点和边的类型

2.在访问节点和边的特征时，使用g.nodes[‘node_type’].data[‘feat_name’] 和g.edges[‘edge_type’].data[‘feat_name’].

# Get the number of all nodes in the graph
print(g.num_nodes())
# 10
# Get the number of drug nodes
print(g.num_nodes('drug'))
# 3
# Nodes of different types have separate IDs,
# hence not well-defined without a type specified
print(g.nodes())
# DGLError: Node type name must be specified if there are more than one node types.
print(g.nodes('drug'))
# tensor([0, 1, 2])

3.如果图只有一种节点或边类型，则不需要明确类型

g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
...    ('drug', 'is similar', 'drug'): (th.tensor([0, 1]), th.tensor([2, 3]))
... })
print(g.nodes())
# tensor([0, 1, 2, 3])
 # To set/get feature with a single type, no need to use the new syntax
g.ndata['hv'] = th.ones(4, 1)

边类型子图

如果从异构图中提取只包含某些关系的子图，构成边类型子图:

g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
})
g.nodes['drug'].data['hv'] = th.ones(3, 1)

# Retain relations ('drug', 'interacts', 'drug') and ('drug', 'treats', 'disease')
# All nodes for 'drug' and 'disease' will be retained
eg = dgl.edge_type_subgraph(g, [('drug', 'interacts', 'drug'),
                                 ('drug', 'treats', 'disease')])
print(eg)
# Graph(num_nodes={'disease': 3, 'drug': 3},
#       num_edges={('drug', 'interacts', 'drug'): 2, ('drug', 'treats', 'disease'): 1},
#       metagraph=[('drug', 'drug', 'interacts'), ('drug', 'disease', 'treats')])
# The associated features will be copied as well
print(eg.nodes['drug'].data['hv'])
# tensor([[1.],
#         [1.],
#         [1.]])

将异构图转为同构图

使用dgl.DGLGraph.to_homogeneous()。会为所有类型的节点和边从0重新编号；合并指定的特征。

g = dgl.heterograph({
    ('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
    ('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))})
g.nodes['drug'].data['hv'] = th.zeros(3, 1)
g.nodes['disease'].data['hv'] = th.ones(3, 1)
g.edges['interacts'].data['he'] = th.zeros(2, 1)
g.edges['treats'].data['he'] = th.zeros(1, 2)

# By default, it does not merge any features
hg = dgl.to_homogeneous(g)
print('hv' in hg.ndata)
# False

# Copy edge features
# For feature copy, it expects features to have
# the same size and dtype across node/edge types
hg = dgl.to_homogeneous(g, edata=['he'])
# DGLError: Cannot concatenate column ‘he’ with shape Scheme(shape=(2,), dtype=torch.float32) and shape Scheme(shape=(1,), dtype=torch.float32)

# Copy node features
hg = dgl.to_homogeneous(g, ndata=['hv'])
print(hg.ndata['hv'])
# tensor([[1.],
#         [1.],
#         [1.],
#         [0.],
#         [0.],
#         [0.]])

在GPU上使用DGLGraph

有两种方式，一种方式是在构建DGLGraph的时候，传递两个GPU张量；另一种方式是先在CPU上构建DGLGraph，然后通过to()复制到GPU

另外，GPU上的DGLGraph只接受GPU上的特征数据

import dgl
import torch as th
u, v = th.tensor([0, 1, 2]), th.tensor([2, 3, 4])
g = dgl.graph((u, v))
g.ndata['x'] = th.randn(5, 3)  # original feature is on CPU
print(g.device)
# device(type='cpu')
cuda_g = g.to('cuda:0')  # accepts any device objects from backend framework
print(cuda_g.device)
# device(type='cuda', index=0)
print(cuda_g.ndata['x'].device)       # feature data is copied to GPU too
# device(type='cuda', index=0)

# A graph constructed from GPU tensors is also on GPU
u, v = u.to('cuda:0'), v.to('cuda:0')
g = dgl.graph((u, v))
print(g.device)
# device(type='cuda', index=0)