【机器学习】中国人民大学高瓴机器学习Lab1图序列自回归模型实验记录

最新推荐文章于 2024-05-06 22:49:37 发布

Isabella...伊莎贝拉

最新推荐文章于 2024-05-06 22:49:37 发布

阅读量867

点赞数 7

分类专栏： python 机器学习回归模型文章标签：机器学习回归 python

本文链接：https://blog.csdn.net/weixin_55762279/article/details/123248576

版权

python 同时被 3 个专栏收录

1 篇文章 0 订阅

订阅专栏

机器学习

1 篇文章 0 订阅

订阅专栏

回归模型

1 篇文章 0 订阅

订阅专栏

高瓴人工智能学院大二下机器学习Lab1图序列自回归模型实验记录

文章目录

高瓴人工智能学院大二下机器学习Lab1图序列自回归模型实验记录

工具：Pycharm/JupyterNotebook

环境：Anaconda(要用到conda环境安装Pytorch和Pytorch-geometric，虽然Pycharm可以也可以安装，但是特别容易报错）

个人理解的伯努利（Bernoulli）抽样是指给定概率p,如果某一个值小于p则取1.反之则取0.

一、大致思路（仅供参考）：

1.三个任务：

我们要基于Pytorch和Pytorch-geometric两个库（不要增删老师给定代码中所引用的包），在给定自回归方式的基础上实现图序列的模拟器。要用给定的公式。

图序列就是图的一个序列，图指的是数据结构中的图结构，序列是指先随机生成前K个，用相应的数据结构表示，再不断地用最近的前K个图预测下一个图，直到图的总量到达给定限制。

为生成好的图序列实现一个数据加载器，它能输出每一个图片和之前的K个图
大致思路和第一个差不多，但是用非线性的自回归模型，要使用到sigmoid函数。

2.说大白话：

GraphSeqGenerator.initialization():

随机生成前K个无向图的概率矩阵，具体表现为三维张量（每一个元素是一个二维张量，每一个二维张量是一个对称的对角线上全是0的矩阵），大小为（K,N,N），N代表图中节点的个数。再由前K个概率矩阵通过伯努利抽样生成对应图的邻接矩阵。

GraphSeqGenerator.sampling():

通过概率矩阵（具体以张量的形式存在）通过伯努利抽样得到对应的邻接矩阵。

3.GraphSeqGenerator.simulation():

在前两个函数的基础上完成图序列的模拟。

通过线性自回归模型预测第K+1个图的概率矩阵，再通过此概率矩阵通过sampling(）函数得到对应的邻接矩阵，再把预测得到的第K+1个矩阵拼接到前K个矩阵后面，所以最后就变成了（K+1,N,N）大小的张量。

4.GraphSeqGenerator2（）：

在前面已经实现的基础上，将线性的自回归方式改变成为新给出的非线性自回归方式。（说白了就是改一下公式那部分的代码，其他的不用变）

二、原题目

原题目加上自我理解

三、准备工作

安装Pytorch和Pytorch-geometric。

这个我是参考的网上的安装教程。链接：pytorch无坑超详细图文CPU版小白安装教程(配gpu版链接、conda命令教程)

一定要仔细阅读，我第一次尝试的时候，没有激活环境就开始在安装所以意料之中地失败了。一定要等每一步完成的时候再继续下一步操作。
另外，在用conda config --add channels的时候不要重复添加，否则就会把重复添加的那条路径移动到路径列表的最顶部。

四、过程详解

！！！！Hint:

self.a = torch.rand(self.order, 1, 1) + 0.1
        self.a = self.a / torch.sum(self.a)

这是生成K个在（0,1）之间的且和为1的系数向量

1.第一个函数 GraphSeqGenerator.initialization() To Do：

    def initialization(self) -> torch.FloatTensor: 
        """
        Initialize K undirected graphs and formulate them as a float tensor with size (K, N, N)
        :return: a torch float tensor with size (K, N, N)
        """
        # TODO: change the following code to achieve the initialization function
        graphs = torch.rand(self.order, self.num_nodes, self.num_nodes).float()  # 生成K*N*N的随机三维矩阵，随机数为(0,1)之间，对角线暂时不处理
        for i in range(self.order):
            for j in range(self.num_nodes):
                graphs[i, j, j] = 0  # 每一个N*N的矩阵对角线上置零

        for i in range(0,self.order):
                graphs[i] = graphs[i]*graphs[i].t()  # 分别在每一个二维矩阵上处理，对称化，得到对称的二维矩阵
       
        graphs = (graphs <= self.sparsity).float() # 跟伯努利是一样的效果，就是每个元素 比sparsity小或者等于都是1，可以解决torch.bernoulli()随机化之后二维矩阵不再对称的问题
        return graphs

2.第二个函数 GraphSeqGenerator.sampling() To Do：

@staticmethod
    def sampling(prob_edges: torch.Tensor) -> torch.FloatTensor:  # 将概率矩阵转化为邻接矩阵
        """
        Sample an adjacency matrix of a undirected graph from a probability matrix  从概率矩阵中对无向图的邻接矩阵进行采样
        :param prob_edges: (N, N) shaped matrix
        :return: a torch float tensor with size (N, N)
        """
        # TODO: Change the code below to sample an adjacency matrix of a undirected graph from a probability matrix
        adj_matrix = (prob_edges <= 0.5).float()  # 实现伯努利抽样，原理和上一节代码中注释相同，用法也是相同的

        # print(adj_matrix) # print出来结果可以一下子看到问题，当然也可以单步调试看看每一步变量的变化过程
        return adj_matrix  # 修改

3.第三个函数 GraphSeqGenerator.simula() To Do：

    def simulation(self, length: int = None) -> list:
        """
        Simulate a graph sequence based on the initialization and sampling functions, and the autoregressive mechanism
        :param length:
        :return:
        """
        if length is None:
            length = self.length
        graph_data = []
        graphs = torch.zeros(length, self.num_nodes, self.num_nodes)
        # TODO: 1) simulate graphs via the auto-regressive model;
        #  2) Convert the format of the graph sequence to "Data" Type defined in PyTorch Geometric;
        #  Hint: please check the function "dense_to_sparse" and the usage of "Data" class

        # visualize the graph sequence
        prob_edges = torch.zeros(self.num_nodes, self.num_nodes)
        undirected_graphs = GraphSeqGenerator.initialization(self)  # 初始化无向图概率矩阵
		# 循环预测第n个图，后面是求取加权和的意思，就是将自定义的K个系数分别和前K个图相乘加起来，
		# 建议学习tensor.cumsum()函数，参考Reference链接；[-1]是指只取最后一层矩阵的值
        for i in range(self.order, length):
            prob_edges = (self.a * undirected_graphs[i - self.order:i]).cumsum(dim=0)[-1]  

            adj_matrix = GraphSeqGenerator.sampling(prob_edges)  # 伯努利抽样
            
             # 将原来的张量和新生成的二维矩阵拼接，
             #注意：因为拼接需要保证维数是一样的，所以用tensor.view()改变二维矩阵的维数，扩展成三维
            undirected_graphs = torch.cat((undirected_graphs, adj_matrix.view((1, self.num_nodes, self.num_nodes))),
                                          dim=0) 
        graphs += undirected_graphs # 之前graphs所有元素全是0，用矩阵加法，对其元素值进行更新

        # 目前存在的问题：如何将图数据存入Data中，提示中让我们学习dense_to_sparse" and the usage of "Data" class
        # 还应该思考的是：现在我是把所有的图预测好全部拼接成了三维矩阵，然后再想着把它转化成Data，但是又没有考虑过边预测边存pt

        # 现在尝试把图数据存成Data要求的形式，节点的特征向量是该节点的度，第二个参数是该图的稀疏矩阵
        graph_data = []
        for i in range(length):
            features = graphs[i].cumsum(dim=0)[-1]
            edge_index, edge_attr = dense_to_sparse(graphs[i])
            # 运用Data所定义的格式储存
            graph_data.append(Data(x=features, edge_index=edge_index, edge_attr=edge_attr))

        # 可直接将tensor保存为图片
        save_image(graphs.view(self.length, 1, self.num_nodes, self.num_nodes), 'graphs.png',
                   nrow=int(self.length ** 0.5))
        return graph_data

4.第二个类SyntheticDataset(Dataset).def getitem(self, idx: int) To Do:

使用torch.Batch函数，从self.data(graphs，type是list)成批加载数据。
graph_current是指idx对应的的那个图（的数据）

    def __getitem__(self, idx: int):
        """
        Given the index of a graph, output this graph and its K previous graphs
        :param idx: the index of a graph
        :return:
        """
        # TODO: Change the code below to achieve the dataset sampler
        #   Hint: 1) graphs_history need to call the functions of the "Batch" Class;
        #         2) Be careful about the range of the index.
        graphs_history = Batch.from_data_list(self.data)
        graph_current = self.data[idx]
        return graphs_history, graph_current

第三个类GraphSeqGenerator2（）To Do:

除了simulation()函数要修改，其他基本上copy前面的。
下面只展示需要修改的这部分函数：


    def simulation(self, length: int = None) -> list:
        """
        Simulate a graph sequence based on the initialization and sampling functions, and the autoregressive mechanism
        :param length:
        :return:
        """
        if length is None:
            length = self.length
        graph_data = []
        graphs = torch.zeros(length, self.num_nodes, self.num_nodes)
        # TODO: 1) simulate graphs via the auto-regressive model;
        #  2) Convert the format of the graph sequence to "Data" Type defined in PyTorch Geometric;
        #  Hint: please check the function "dense_to_sparse" and the usage of "Data" class

        # visualize the graph sequence
        prob_edges = torch.zeros(self.num_nodes, self.num_nodes)
        undirected_graphs = GraphSeqGenerator.initialization(self)  # 初始化

        for i in range(self.order, length):
            prob_edges = (self.a * (undirected_graphs[i - self.order:i] - 0.5)).cumsum(dim=0)[-1]  # 加权和
            prob_edges = torch.sigmoid(prob_edges)  # 取非线性函数
            adj_matrix = GraphSeqGenerator.sampling(prob_edges)  # 伯努利抽样
            undirected_graphs = torch.cat((undirected_graphs, adj_matrix.view((1, self.num_nodes, self.num_nodes))),
                                          dim=0)  # 拼接????该怎么拼接

        graphs += undirected_graphs

        # 目前存在的问题：如何将图数据存入Data中，提示中让我们学习dense_to_sparse" and the usage of "Data" class
        # 还应该思考的是：现在我是把所有的图预测好全部拼接成了三维矩阵，然后再想着把它转化成Data，但是又没有考虑过边预测边存pt

        # 现在尝试把图数据存成Data要求的形式，节点的特征向量是该节点的度，第二个参数是该图的稀疏矩阵
        graph_data = []
        for i in range(length):
            features = graphs[i].cumsum(dim=0)[-1]
            edge_index, edge_attr = dense_to_sparse(graphs[i])
            # 运用Data所定义的格式储存
            graph_data.append(Data(x=features, edge_index=edge_index, edge_attr=edge_attr))

        # 可直接将tensor保存为图片
        save_image(graphs.view(self.length, 1, self.num_nodes, self.num_nodes), 'graphs.png',
                   nrow=int(self.length ** 0.5))
        return graph_data

Reference

tensor.cumsum()函数：剖析 | torch.cumsum维度详解
Batch.from_data_list()函数：

Isabella...伊莎贝拉

关注

7
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
【机器学习】中国人民大学高瓴机器学习Lab1图序列自回归模型实验记录

高瓴人工智能学院大二下机器学习Lab1图序列自回归模型实验记录工具：Pycharm/JupyterNotebook环境：Anaconda(要用到conda环境安装Pytorch和Pytorch-geometric，虽然Pycharm可以也可以安装，但是特别容易报错）个人理解的伯努利（Bernoulli）抽样是指给定概率p,如果某一个值小于p则取1.反之则取0.一、大致思路（仅供参考）：1.三个任务：我们要基于Pytorch和Pytorch-geometric两个库（不要增删老师给定代码中所引用
复制链接

扫一扫