Task01 简单图论与环境配置与PyG库
关于行走个数的定理
设图的邻接矩阵为A,An是邻接矩阵的n次方,那么矩阵中的每个元素 A n [ i , j ] A^n[i,j] An[i,j]表示从节点 i i i 到节点 j j j 的长度为 n 的行走的个数。
PyG安装与PyTorch环境
- torch=1.8.0
- cuda=10.2
- torch-geometric=1.7.0
作业描述
- 请通过继承
Data
类实现一个类,专门用于表示“机构-作者-论文”的网络。该网络包含“机构“、”作者“和”论文”三类节点,以及“作者-机构“和“作者-论文“两类边。对要实现的类的要求:1)用不同的属性存储不同节点的属性;2)用不同的属性存储不同的边(边没有属性);3)逐一实现获取不同节点数量的方法。
作业思路
这很明显是个异质图的实现,而 pyG 框架中的Data类是同质图的数据封装类型。那么,原有的Data类中的 x 和 edge_index无法直接满足要求。
实现方面,为了简单起见,x_heter的第0维度设置为flag标记位, x _ h e t e r [ 0 ] ∈ { " a u t h o r " , " o r g a n i z a t i o n " , " p a p e r " } x\_heter[0]\in\{"author","organization","paper"\} x_heter[0]∈{"author","organization","paper"}来判断节点的类型,edge_index_heter的第0维度设置为flag标记位, e d g e _ i n d e x _ h e t e r [ 0 ] ∈ { " a u t h o r 2 o r g a n i z a t i o n " , " a u t h o r 2 p a p e r " } edge\_index\_heter[0]\in\{"author2organization","author2paper"\} edge_index_heter[0]∈{"author2organization","author2paper"}来判断边的类型。这里实现的edge_index_heter与Data类的edge_index不同,shape=(num_edges, 3)。
类的定义:
class OAP(Data):
def __init__(self, x_heter=None, edge_index_heter=None):
super(OAP, self).__init__()
self.organization_nodes = []
self.author_nodes = []
self.paper_nodes = []
for instance in x_heter:
if instance[0]=='author':
self.author_nodes.append(instance[1:])
if instance[0]=='organization':
self.organization_nodes.append(instance[1:])
if instance[0]=='paper':
self.paper_nodes.append(instance[1:])
self.edge_author2organization = []
self.edge_author2paper = []
for edge in edge_index_heter:
if edge[0]=='author2organization':
self.edge_author2organization.append(edge[1:])
if edge[0]=='author2paper':
self.edge_author2paper.append(edge[1:])
def num_organization_nodes(self):
return len(self.organization_nodes)
def num_author_nodes(self):
return len(self.author_nodes)
def num_paper_nodes(self):
return len(self.paper_nodes)
编写测试用例:
# 两个机构,第一个机构一个作者,1篇paper
# 第二个机构两个作者,一个作者1篇paper,另外一个作者两篇paper
x = [['author','Zhangsan'],
['author','Lisi'],
['author','Wangwu'],
['organization','DeepMind'],
['organization', 'MIT'],
['paper',0,'ResNet','ref-100000'],
['paper',1,'GAN', 'ref-10000'],
['paper',2,'GAT', 'ref-5000'],
['paper',3,'GCN', 'ref-2000']]
edge_index = [['author2organization', 'Zhangsan','DeepMind'],
['author2organization', 'Lisi','MIT'],
['author2organization', 'Wangwu','MIT'],
['author2paper', 'Zhangsan',0],
['author2paper', 'Lisi',1],
['author2paper', 'Wangwu',2],
['author2paper', 'Wangwu',3]]
data = OAP(x_heter=x, edge_index_heter=edge_index)
print("data.num_paper_nodes = ", data.num_paper_nodes())
print("data.num_author_nodes = ", data.num_author_nodes())
print("data.num_organization_nodes = ", data.num_organization_nodes())
测试用例输出: