图神经网络Task1

最新推荐文章于 2024-10-17 17:26:49 发布

wk_43245857

最新推荐文章于 2024-10-17 17:26:49 发布

阅读量169

点赞数

分类专栏： MessagePassing 文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_43245857/article/details/117967941

版权

MessagePassing 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Task1：简单图论与PyG库配置

1.图论基础知识
- 1.1关于图论部分主要有一些基本概念：
- 1.2 GNN的机器学习任务
2.环境配置与PyG库的使用

1.图论基础知识

1.1关于图论部分主要有一些基本概念：

1）图定义：图用两个集合定义，一个子集为各节点集合v，一个集合为各边的集合M；
此外，节点与边可以是类别型或数值型
2）图的邻接矩阵：A，一个NxN的矩阵，Aij为节点vi到vj的边；
根据边的方向、权值又分为无向图/有相图、无权图/有权图的概念
3）节点的度：该节点的权值之和；
4）邻接节点：与vi直接相邻的节点，N(vi)
5）行走：节点→边→节点→边的可重复序列
6）路径：节点不可重复的序列

1.2 GNN的机器学习任务

主要有节点预测、边预测、图预测、节点聚类、图生成等任务

2.环境配置与PyG库的使用

2.1 环境配置

只想简单了解一下GNN，GPU不行，暂时采用了两种途径：、
1）安装CPU版本
主要步骤有：anaconda中新建环境（略）、安装pytorch、安装torch-geometric。

pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
pip install torch-geometric

2）colab环境配置
第一次了解colab，我是参考这篇帖子建立对colab的初步了解
https://blog.csdn.net/w1520039381/article/details/117515712
具体配置如下：

! pip install  -U torch_geometric
! pip install  -U torch_sparse
! pip install  -U torch_scatter

2.2 Data类的使用

通过文档大概了解Data类的一些基本属性与方法：

from torch_geometric.datasets import KarateClub
dataset = KarateClub()
data= dataset[0]
print(data)
print('==============================================================')
# 获取图的⼀些信息
print(f'Number of nodes: {data.num_nodes}')  # 节点数量
print(f'Number of edges: {data.num_edges}')  # 边数量
print(f'Number of node features: {data.num_node_features}')  # 节点属性的维度
print(f'Number of node features: {data.num_features}')  # 同样是节点属性的维度
print(f'Number of edge features: {data.num_edge_features}')  # 边属性的维度
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')  # 平均节点度
print(f'if edge indices are ordered and do not contain duplicate entries.: {data.is_coalesced()}') # 是否边是有序的同时不含有重复的边
print(f'Number of training nodes: {data.train_mask.sum()}')  # ⽤作训练集的节点
print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes: .2f}') # ⽤作训练集的节点数占⽐
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')  # 此图是否包含孤⽴的节点
print(f'Contains self-loops: {data.contains_self_loops()}')  # 此图是否包含⾃环的边
print(f'Is undirected: {data.is_undirected()}')  # 此图是否是⽆向图

2.2 Dataset类的使用

以Planetoid数据集为例，查看数据属性。但在实际应用过程中，通常出现数据集无法下载的问题，具体的解决方法是1）打开Planetoid文件，将下载地址从

url =https://github.com/kimiyoung/planetoid/raw/master/data'

更换为一个托管到Gitee上的相同的数据

url= 'https://gitee.com/jiajiewu/planetoid/raw/master/data'

from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/dataset/Cora', name='Cora')
# Cora()
a1 = len(dataset)  # 1
a2 = dataset.num_classes  # 7
a3 = dataset.num_node_features  # 1433
data = dataset[0] 
# Data(edge_index=[2, 10556], test_mask=[2708],
#         train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
a5= data.is_undirected() # True
a6 = data.train_mask.sum().item() # 140
a7= data.val_mask.sum().item() # 500
a8= data.test_mask.sum().item() # 1000

2.3 作业

作业要求：
请通过继承 Data 类实现⼀个类，专⻔⽤于表示“机构-作者-论⽂”的⽹络。该⽹络包含“机构“、”作者“和”论⽂”
三类节点，以及“作者-机构“和“作者-论⽂“两类边。对要实现的类的要求：
1）⽤不同的属性存储不同节点的属
性；
2）⽤不同的属性存储不同的边（边没有属性）；
3）逐⼀实现获取不同节点数量的⽅法。
实际应用时，基本的Data类可能无法满足使用需求，需要继承定义自己的类；

from torch_geometric.data  import Data
class new_network (Data):
    def __init__(self, inst_x, paper_x, author_x, author_inst_edge_index, author_paper_edge_index, author_inst_edge_attr, author_paper_edge_attr, y, **kwargs):
        super().__init__(**kwargs)
        #1）用不同的属性存储不同节点属性
        self.inst_x = inst_x
        self.paper_x = paper_x
        self.author_x = author_x
        # 2）用不同的属性存储边
        self.author_inst_edge_index = author_inst_edge_index
        self.author_paper_edge_index = author_paper_edge_index

        self.author_inst_edge_attr = author_inst_edge_attr
        self.author_paper_edge_attr = author_paper_edge_attr

        self.y = y
		3）逐⼀实现获取不同节点数量的⽅法。
        @property
        def get_author_nodes_num(self):
            return self.author_x.shape[0]

        def get_paper_nodes_num(self):
            return self.paper_x.shape[0]

        def get_inst_nodes_num(self):
            return self.inst_x.shape[0]