跟着官方文档学DGL框架第三天——关于图的二三事

主要参考:https://docs.dgl.ai/guide/graph-graphs-nodes-edges.html

创建DGLGraph图

之前讲了用dgl.DGLGraph()创建图的几种方式。这里提到用dgl.graph()也可以创建DGLGraph图。

与dgl.DGLGraph((u,v))类似,dgl.graph((u,v))中,u和v分别为头节点列表和尾节点列表,列表对应位置的元素确定一条边。

# edges 0->1, 0->2, 0->3, 1->3
u, v = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
g = dgl.graph((u, v))
print(g) # number of nodes are inferred from the max node IDs in the given edges

输出节点IDs

# Node IDs
print(g.nodes())

输出边,与创建时一样,按照头节点列表和尾节点列表形式。

# Edge end nodes
print(g.edges())

输出边的详细信息,除了头节点列表和尾节点列表,还有边的IDs。

# Edge end nodes and edge IDs
print(g.edges(form='all'))

如果有孤立节点,需要在创建边时显示设置节点数量

# If the node with the largest ID is isolated (meaning no edges),
# then one needs to explicitly set the number of nodes
g = dgl.graph((u, v), num_nodes=8)

也可以将单向图转化为双向图

bg = dgl.to_bidirected(g)
bg.edges()

IDs的存储位数

DGL可以选择32-bit或64-bit整型来存储节点和边的IDs。如果一个图的节点数或边数少于 2 31 − 1 2^{31}-1 2311个,则使用32-bit整型以节省内存。
DGL默认使用64-bit整型,可以在创建时手动设置,也可以通过long(),int()修改。

g32 = dgl.graph(edges, idtype=th.int32)  # create a int32 graph
print(g32.idtype)

g64_2 = g32.long()  # convert to int64
print(g64_2.idtype)

g32_2 = g64.int()  # convert to int32
print(g32_2.idtype)

节点和边的特征

1.使用g.ndata[‘x’]可以为节点添加特征或访问节点特征

2.使用g.edata[‘x’]可以为边添加特征或访问边特征

3.只有数值型的才可以作为特征,可以是标量、向量、张量

4.节点特征名称可以与边特征名称重复

5.同一名称的特征只能有同一维度和类型

6.对于加权图,可以将权重作为边特征

从外部数据构建DGLGraph图

1.可以从SciPy稀疏矩阵和NetworkX图创建。

import dgl
import torch as th
import scipy.sparse as sp
spmat = sp.rand(100, 100, density=0.05) # 5% nonzero entries
dgl.from_scipy(spmat)                   # from SciPy

import networkx as nx
nx_g = nx.path_graph(5) # a chain 0-1-2-3-4
dgl.from_networkx(nx_g) # from networkx

注意用nx.path_graph(5)转化而来的图有8条边,这是因为NetworkX图是无向图,而DGLGraph是有向图,一条无向边转为了两条有向边。

如果要避免这种情况,需要用networkx.DiGraph()构建有向图:

nxg = nx.DiGraph([(2, 1), (1, 2), (2, 3), (0, 0)])
dgl.from_networkx(nxg)

2.从磁盘中加载图
1)可以是CSV格式

2)JSON/GML格式

3)DGL Binary格式
使用这两个API,dgl.save_graphs(), dgl.load_graphs()可以实现图的保存和加载

异构图

异构图可以有不同类型的节点和边

异构图的创建

格式为:‘关系:节点元组’。
其中‘关系’具体形式为:[头节点类型,边类型,尾节点类型];
节点元组具体形式为:([U], [V]),其中U和V分别代表头节点列表和尾节点列表。

import dgl
import torch as th

# Create a heterograph with 3 node types and 3 edges types.
graph_data = {
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
}
g = dgl.heterograph(graph_data)
print(g.ntypes)
# ['disease', 'drug', 'gene']
print(g.etypes)
# ['interacts', 'interacts', 'treats']
print(g.canonical_etypes)
# [('drug', 'interacts', 'drug'),
#  ('drug', 'interacts', 'gene'),
#  ('drug', 'treats', 'disease')]

同构图和二部图都可以当成特殊的异构图:

# A homogeneous graph
dgl.heterograph({('node_type', 'edge_type', 'node_type'): (u, v)})
# A bipartite graph
dgl.heterograph({('source_type', 'edge_type', 'destination_type'): (u, v)})

元图表示异构图的架构(本体)

元图中只有各类型节点以及他们之间的各类型边:

print(g)
# Graph(num_nodes={'disease': 3, 'drug': 3, 'gene': 4},
#       num_edges={('drug', 'interacts', 'drug'): 2,
#                  ('drug', 'interacts', 'gene'): 2,
#                  ('drug', 'treats', 'disease'): 1},
#       metagraph=[('drug', 'drug', 'interacts'),
#                  ('drug', 'gene', 'interacts'),
#                  ('drug', 'disease', 'treats')])
print(g.metagraph().edges())
# OutMultiEdgeDataView([('drug', 'drug'), ('drug', 'gene'), ('drug', 'disease')])

操作多种类型

1.在访问节点和边时,需要明确节点和边的类型

2.在访问节点和边的特征时,使用g.nodes[‘node_type’].data[‘feat_name’] 和g.edges[‘edge_type’].data[‘feat_name’].

# Get the number of all nodes in the graph
print(g.num_nodes())
# 10
# Get the number of drug nodes
print(g.num_nodes('drug'))
# 3
# Nodes of different types have separate IDs,
# hence not well-defined without a type specified
print(g.nodes())
# DGLError: Node type name must be specified if there are more than one node types.
print(g.nodes('drug'))
# tensor([0, 1, 2])

3.如果图只有一种节点或边类型,则不需要明确类型

g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
...    ('drug', 'is similar', 'drug'): (th.tensor([0, 1]), th.tensor([2, 3]))
... })
print(g.nodes())
# tensor([0, 1, 2, 3])
 # To set/get feature with a single type, no need to use the new syntax
g.ndata['hv'] = th.ones(4, 1)

边类型子图

如果从异构图中提取只包含某些关系的子图,构成边类型子图:

g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
})
g.nodes['drug'].data['hv'] = th.ones(3, 1)

# Retain relations ('drug', 'interacts', 'drug') and ('drug', 'treats', 'disease')
# All nodes for 'drug' and 'disease' will be retained
eg = dgl.edge_type_subgraph(g, [('drug', 'interacts', 'drug'),
                                 ('drug', 'treats', 'disease')])
print(eg)
# Graph(num_nodes={'disease': 3, 'drug': 3},
#       num_edges={('drug', 'interacts', 'drug'): 2, ('drug', 'treats', 'disease'): 1},
#       metagraph=[('drug', 'drug', 'interacts'), ('drug', 'disease', 'treats')])
# The associated features will be copied as well
print(eg.nodes['drug'].data['hv'])
# tensor([[1.],
#         [1.],
#         [1.]])

将异构图转为同构图

使用dgl.DGLGraph.to_homogeneous()。会为所有类型的节点和边从0重新编号;合并指定的特征。

g = dgl.heterograph({
    ('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
    ('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))})
g.nodes['drug'].data['hv'] = th.zeros(3, 1)
g.nodes['disease'].data['hv'] = th.ones(3, 1)
g.edges['interacts'].data['he'] = th.zeros(2, 1)
g.edges['treats'].data['he'] = th.zeros(1, 2)

# By default, it does not merge any features
hg = dgl.to_homogeneous(g)
print('hv' in hg.ndata)
# False

# Copy edge features
# For feature copy, it expects features to have
# the same size and dtype across node/edge types
hg = dgl.to_homogeneous(g, edata=['he'])
# DGLError: Cannot concatenate column ‘he’ with shape Scheme(shape=(2,), dtype=torch.float32) and shape Scheme(shape=(1,), dtype=torch.float32)

# Copy node features
hg = dgl.to_homogeneous(g, ndata=['hv'])
print(hg.ndata['hv'])
# tensor([[1.],
#         [1.],
#         [1.],
#         [0.],
#         [0.],
#         [0.]])

在GPU上使用DGLGraph

有两种方式,一种方式是在构建DGLGraph的时候,传递两个GPU张量;另一种方式是先在CPU上构建DGLGraph,然后通过to()复制到GPU

另外,GPU上的DGLGraph只接受GPU上的特征数据

import dgl
import torch as th
u, v = th.tensor([0, 1, 2]), th.tensor([2, 3, 4])
g = dgl.graph((u, v))
g.ndata['x'] = th.randn(5, 3)  # original feature is on CPU
print(g.device)
# device(type='cpu')
cuda_g = g.to('cuda:0')  # accepts any device objects from backend framework
print(cuda_g.device)
# device(type='cuda', index=0)
print(cuda_g.ndata['x'].device)       # feature data is copied to GPU too
# device(type='cuda', index=0)

# A graph constructed from GPU tensors is also on GPU
u, v = u.to('cuda:0'), v.to('cuda:0')
g = dgl.graph((u, v))
print(g.device)
# device(type='cuda', index=0)
  • 4
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值