MindSpore开发者群英会-GAT论文复现

最新推荐文章于 2024-09-29 22:27:37 发布

小乐快乐

最新推荐文章于 2024-09-29 22:27:37 发布

阅读量97

点赞数

文章标签：人工智能深度学习 python

本文链接：https://blog.csdn.net/weixin_45666880/article/details/132358113

版权

本文旨在于分享GAT项目复现的实现过程和经验！

Main Task

任务描述

参考任务清单中paperswithcode链接，学习论文中的网络模型，基于MindSpore框架完成对应网络模型的复现。任务要求

建议基于MindSpore框架最新版本（Nightly）进行开发，不能使用其他AI框架的接口。最新版本安装请参考：https://www.mindspore.cn/install 选择Nightly版本。
任务分为两部分：
①【推理】完成网络模型和权重转成MindSpore格式并做推理；

②【训练】在完成推理之后，需要完成从0到1的训练；
代码提交至个人github开源仓库，并在paperswithcode网站对应的论文下添加自己的代码实现。
代码完成之后，在昇思MindSpore论坛、CSDN、知乎等任一平台发帖分享自己的项目实现过程与经验。

In this project, our objective is to perform inference and training of Graph Attention Networks (GAT) using the MindSpore framework. We will achieve this by converting the PyTorch implementation of GAT created by Diego999, which can be found on GitHub at the following page: https://github.com/Diego999/pyGAT/tree/master.

Environment

To meet the requirement of using the MindSpore Nightly version, you can easily download it using the command pip install mindspore-dev -i https://pypi.tuna.tsinghua.edu.cn/simple. This will install the development version of MindSpore, incorporating the latest features and updates. By utilizing the Tsinghua mirror, you can ensure efficient and reliable downloads.

For the implementation, I have decided to utilize the mindspore_2.0_train image. This image provides a preconfigured environment specifically designed for training tasks using MindSpore version 2.0. It includes all the necessary tools and libraries to effectively work with MindSpore and carry out your GAT model implementation.

Dataset

In the paper's implementation, they utilize the Cora dataset. The Cora dataset is a commonly used benchmark dataset in the field of graph representation learning. It consists of scientific publications classified into different research areas. The dataset contains a citation network where nodes represent papers and edges represent citations between them.

Cora is a dataset containing 2708 scientific papers, grouped into seven distinct categories. The citation network comprises 10556 connections. Each paper is represented by a binary word vector, which indicates whether a particular word from the 1433-word dictionary is present or absent.

Cora Dataset:

Total Nodes: 2708
Total Edges: 10556
Number of Class: 7

You can download the Cora dataset from the following link: Cora dataset. This link provides access to the archived version of the dataset.Typically, the Cora dataset is composed of the following three files:

cora.content: This file contains the node features and labels. Each line represents a node in the graph, with the following format:
cora.cites: This file represents the citation relationships between papers.
cora.names: This file contains the mapping of class labels to their respective names.

I have processed the dataset in a manner similar to the data processing steps used in the PyTorch implementation. The data processing pipeline for the Cora dataset typically involves the following steps:

import numpy as np
import scipy.sparse as sp
import mindspore
from mindspore import Tensor

from mindspore.ops import operations as P
from mindspore import dtype as mstype

def encode_onehot(labels):
    # The classes must be sorted before encoding to enable static class encoding.
    # In other words, make sure the first class always maps to index 0.
    
    classes = sorted(list(set(labels)))
    
    classes_dict = {c: np.identity(len(classes))[i, :] for i, c in enumerate(classes)}
    
    labels_onehot = np.array(list(map(classes_dict.get, labels)), dtype=np.int32)
    
    return labels_onehot

def normalize_adj(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))    
    r_inv_sqrt = np.power(rowsum, -0.5).flatten()  
    r_inv_sqrt[np.isinf(r_inv_sqrt)] = 0.
    r_mat_inv_sqrt = sp.diags(r_inv_sqrt)   #D^{-0.5}
    return mx.dot(r_mat_inv_sqrt).transpose().dot(r_mat_inv_sqrt)
    
def normalize_features(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))    
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx

def accuracy(output, labels):
    # Find the indices of the maximum values along dimension 1 (columns)
    preds = P.Argmax(axis=1)(output)

    # Cast the predictions tensor to the same data type as the labels tensor
    cast = P.Cast()
    preds = cast(preds, labels.dtype)

    # Compute element-wise equality between predictions and labels
    correct = P.Equal()(preds, labels)

    # Cast the correct tensor to float and sum the correct predictions
    correct = cast(correct, mstype.float32)
    correct_sum = P.ReduceSum()(correct)

    # Compute the accuracy by dividing the sum of correct predictions by the number of labels
    accuracy = correct_sum / len(labels)

    return accuracy

def load_data(path, dataset):
    """Load citation network dataset (cora only for now)"""
    print('Loading {} dataset...'.format(dataset))

    idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str))
    
    features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32)
    labels = encode_onehot(idx_features_labels[:, -1])

    # build graph
    idx = np.array(idx_features_labels[:, 0], dtype=np.int32)
    idx_map = {j: i for i, j in enumerate(idx)} 
    edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32) 
    edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape)
    
    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32)
    

    # build symmetric adjacency matrix
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

    features = normalize_features(features) 
    adj = normalize_adj(adj + sp.eye(adj.shape[0])) #adj = D^{-0.5}SD^{-0.5}, S=A+I

    idx_train = np.arange(140)
    idx_val = np.arange(200, 500)
    idx_test = np.arange(500, 1500)

    adj = Tensor(np.array(adj.todense()), dtype=mindspore.float32)
    features = Tensor(np.array(features.todense()), dtype=mindspore.float32)
    labels = Tensor(np.where(labels)[1], dtype=mindspore.int32)

    idx_train = Tensor(idx_train, dtype=mindspore.int64)
    idx_val = Tensor(idx_val, dtype=mindspore.int64)
    idx_test = Tensor(idx_test, dtype=mindspore.int64)

    return adj, features, labels, idx_train, idx_val, idx_test
复制

完成网络模型和权重转成MindSpore格式并做推理；

To convert a .pth file into a MindSpore .ckpt file, I performed the conversion manually. Here are the steps I followed:

Reading the .pth file: I loaded the .pth file using PyTorch's functionality and stored the model parameters in a dictionary.
Storing parameters in a dictionary: I extracted the model parameters from the loaded .pth file and saved them into a dictionary. The keys of the dictionary corresponded to the parameter names for convenience.
Saving the checkpoint: Using MindSpore's checkpoint functionality, I saved the parameters dictionary as a .ckpt file. This allowed me to create a MindSpore-compatible checkpoint file that could be used for further inference or training.

By utilizing the parameter names during the model implementation, it facilitated a smoother conversion process and ensured consistency between the parameter names in the PyTorch model and the MindSpore checkpoint.

converter.py

import torch
import mindspore as ms

def pytorch2mindspore():
    """read pth file"""
    par_dict = torch.load('inference/model.pth', map_location=lambda storage, loc: storage)
    params_list = []
    for name in par_dict:
        param_dict = {}
        parameter = par_dict[name]
        param_dict['name'] = name
        param_dict['data'] = ms.Tensor(parameter.numpy())
        params_list.append(param_dict)
    ms.save_checkpoint(params_list, './inference/model.ckpt')
    
pytorch2mindspore()复制

The model checkpoint file takes the form of the image provided below. During the implementation of the model, I utilized the parameter names "attn_s" and "attn_d" to ensure a smooth parameter loading process and minimize the chances of errors. These parameter names can be found in the implementation of the GATLayer.

By explicitly using these parameter names, I aimed to match them with the corresponding parameters in the checkpoint file, ensuring a correct and error-free loading process.

#One GAT Layer

class GATLayer(nn.Cell):
    """
    MindSpore Implementation of the Graph Attention layer.
    """
    def __init__(self,in_features, out_features, dropout, alpha, training= True, concat=True):
        super(GATLayer, self).__init__()
        
        self.training      = training
        self.dropout       = dropout        # drop prob = 0.6
        self.in_features   = in_features    # 
        self.out_features  = out_features   # 
        self.alpha         = alpha          # LeakyReLU with negative input slope, alpha = 0.2
        self.concat        = concat         # conacat = True for all layers except the output layer.
        gain = 1.414
        # Initialize the weight matrix W
        self.attn_s = ms.Parameter(initializer(XavierUniform(gain), [in_features, out_features], ms.float32), name="attn_s_{}".format(out_features))
        self.attn_d = ms.Parameter(initializer(XavierUniform(gain), [2 * out_features,1], ms.float32), name="attn_d_{}".format(out_features))
        
        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def construct(self, input, adj):
        # Linear Transformation
        h = P.MatMul()(input, self.attn_s)

        e = self._prepare_attentional_mechanism_input(h)
        # Masked Attention 
        zero_vec = -9e15*ops.ones_like(e)
        attention = ops.where(adj > 0, e, zero_vec)
        
        softmax = ops.Softmax(axis=1)
        attention = softmax(attention)
        if self.training:
            attention = ops.dropout(attention, p = self.dropout)
        h_prime   = ops.matmul(attention, h)
        if self.concat:
            return ops.elu(h_prime)
        else:
            return h_prime
    def _prepare_attentional_mechanism_input(self,h):
        h1 = ops.matmul(h, self.attn_d[:self.out_features, :])
        h2 = ops.matmul(h, self.attn_d[self.out_features:, :])
        # broadcast add
        e = h1 + h2.T
        return self.leakyrelu(e)复制

Build Multi-head:

class MultiHeadGATLayer(nn.Cell):
    """
    MindSpore Implementation of the Multi-head Graph Attention layer.
    """
    def __init__(self, input_feature_size, output_size, nclass, dropout, alpha, nheads, training = True):
        super(MultiHeadGATLayer, self).__init__()
        self.dropout=dropout
        self.training = training
        self.attentions = nn.CellList()
        for _ in range(nheads):
            attention = GATLayer(in_features= input_feature_size, out_features=output_size, dropout=dropout, alpha=alpha, training = self.training, concat=False)
            self.attentions.append(attention)
        
        self.out_att = GATLayer(in_features= output_size*nheads, out_features = nclass, dropout=dropout, training = self.training, alpha=alpha, concat=True)

    def construct(self, x, adj):
        if self.training:
            x= ops.dropout(x, p = self.dropout)
        x=ops.cat([att(x,adj) for att in self.attentions], axis = 1)
        if self.training:
            x=ops.dropout(x, p = self.dropout)
        #elu = nn.ELU()
        x= ops.elu(self.out_att(x,adj))
        
        return ops.log_softmax(x, axis = 1)复制

Inference

To verify the effectiveness and performance of the converted Mindspore ckpt file, we'll first present the performance of the initial PyTorch implementation of GAT, followed by its inference in Mindspore. Fortunately, both implementations yielded same accuracy levels, as demonstrated in the results below.

Pytorch implementation inference result:

Converted mindspore inference result:

Both implementations exhibit the same level of accuracy.

Training

Now, let's proceed to train our GAT Mindspore implementation and assess its accuracy.

I allows for only 20 nodes per class to be used for training—however, honoring the transductive setup, the training algorithm has access to all of the nodes’ feature vectors. The predictive power of the trained models is evaluated on 1000 test nodes, and I use 500 additional nodes for validation purposes (the same ones as used by Kipf & Welling (2017)).

train.py

import time
import os
import argparse
import glob
import random
import numpy as np
import mindspore as ms
import mindspore
import mindspore.nn as nn
from mindspore import Model, ops, load_checkpoint, load_param_into_net, save_checkpoint
from mindspore.common import dtype as mstype
from mindspore.nn import Cell
from mindspore.ops import GradOperation
import mindspore.context as context

from utils import load_data, accuracy
from model import MultiHeadGATLayer


# Training settings
parser = argparse.ArgumentParser(description='GAT')
parser.add_argument('--path', type=str, default="./cora/", help='path of the cora dataset directory.')
parser.add_argument('--device', type=str, default="GPU", help='GPU training.')
parser.add_argument('--fastmode', action='store_true', default=False, help='Validate during training pass.')
parser.add_argument('--seed', type=int, default=72, help='Random seed.')
parser.add_argument('--epochs', type=int, default=1000, help='Number of epochs to train.')
parser.add_argument('--lr', type=float, default=0.005, help='Initial learning rate.')
parser.add_argument('--weight_decay', type=float, default=5e-4, help='Weight decay (L2 loss on parameters).')
parser.add_argument('--hidden', type=int, default=8, help='Number of hidden units.')
parser.add_argument('--nb_heads', type=int, default=8, help='Number of head attentions.')
parser.add_argument('--dropout', type=float, default=0.6, help='Dropout rate (1 - keep probability).')
parser.add_argument('--alpha', type=float, default=0.2, help='Alpha for the leaky_relu.')
parser.add_argument('--patience', type=int, default=100, help='Patience')

args = parser.parse_args()

device_id= 0

context.set_context(device_target=args.device, mode=context.GRAPH_MODE, device_id=device_id)

random.seed(args.seed)

dataset = "cora"

# Load data
adj, features, labels, idx_train, idx_val, idx_test = load_data(args.path, dataset)

model = MultiHeadGATLayer(input_feature_size = 1433, output_size=args.hidden, nclass = 7, dropout= args.dropout, alpha = args.alpha, nheads =args.nb_heads, training = True)
loss_fn = nn.NLLLoss()
optimizer = nn.optim.Adam(model.trainable_params(), learning_rate=args.lr, weight_decay=args.weight_decay)

# Define forward function
def forward_fn(features, adj, labels):
    logits = model(features, adj)
    loss = loss_fn(logits[idx_train], labels[idx_train])
   # acc_train = accuracy(logits[idx_train], labels[idx_train])
    
    return loss, logits

# Get gradient function
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)


# Define function of one-step training
def train_step(features, adj, labels):
    (loss, logits), grads = grad_fn(features, adj, labels)
    optimizer(grads)
    acc_train = accuracy(logits[idx_train], labels[idx_train])
    return loss, acc_train, logits

def train_loop(model, features, adj, labels):
    t = time.time()
    model.set_train()
    
    loss_train ,acc_train, output = train_step(features, adj, labels)
    if not args.fastmode:
        model.set_train(False)
        output = model(features,adj)
    loss_val = loss_fn(output[idx_val], labels[idx_val])
    acc_val = accuracy(output[idx_val], labels[idx_val])

    print('Epoch: {:d}'.format(epoch+1),
          'loss_train: {:.4f}'.format(loss_train.asnumpy()),
          'acc_train: {:.4f}'.format(acc_train.asnumpy()),
          'loss_val: {:.4f}'.format(loss_val.asnumpy()),
          'acc_val: {:.4f}'.format(acc_val.asnumpy()),
          'time: {:.4f}s'.format(time.time() - t))

    return loss_val.asnumpy()

def test_loop(model, features, adj, labels, loss_fn):
    model.set_train(False)
    pred = model(features, adj)
    loss_test = loss_fn(pred[idx_test], labels[idx_test])
    acc_test = accuracy(pred[idx_test], labels[idx_test])
  

    print('Testing Results\n',
          'loss_test: {:.4f}'.format(loss_test.asnumpy()),
          'acc_test: {:.4f}'.format(acc_test.asnumpy()))

#Training
t_total = time.time()
loss_values = []
bad_counter = 0
best = args.epochs + 1
best_epoch = 0
print(args)
for epoch in range(args.epochs):
    print(f"Epoch {epoch+1}\n-------------------------------")
    loss_values.append(train_loop(model, features, adj, labels))
    save_checkpoint(model, '{}.ckpt'.format(epoch))
    if loss_values[-1] < best:
        best = loss_values[-1]
        best_epoch = epoch
        bad_counter = 0
    else:
        bad_counter += 1

    if bad_counter == args.patience:
        break

    files = glob.glob('*.ckpt')
    for file in files:
        epoch_nb = int(file.split('.')[0])
        if epoch_nb < best_epoch:
            os.remove(file)

files = glob.glob('*.ckpt')
for file in files:
    epoch_nb = int(file.split('.')[0])
    if epoch_nb > best_epoch:
        os.remove(file)

print("Optimization Finished!")
print("Total time elapsed: {:.4f}s".format(time.time() - t_total))


#Testing
param_dict = load_checkpoint('{}.ckpt'.format(best_epoch))
load_param_into_net(model, param_dict)
test_loop(model, features, adj, labels, loss_fn)
复制

The complete code for the GAT Mindspore implementation, including the 'utils.py', 'model.py', and inference codes, can be found on the Gitee repository: MindSpore-GAT论文复现: MindSpore开发者群英会-GAT论文复现.

Accuracy:

The accuracy of my GAT Mindspore implementation very close that of the original PyTorch implementation. The results are presented below:

After numerous training iterations of the network, I observed that the accuracy consistently falls within a range of 0.82 to 0.84.

Conclusion

In conclusion, the conversion of the GAT model from PyTorch to Mindspore was successful. After a series of training runs, the Mindspore implementation demonstrated a robust performance, with an accuracy level consistently ranging between 0.82 and 0.84. This is comparable to the original PyTorch implementation, thus confirming the effectiveness and reliability of the conversion process. The results from this experiment underscore the versatility of AI frameworks and validate our methodology, demonstrating that it's possible to transition between different platforms while maintaining model performance and accuracy. This achievement opens avenues for future work, where models can be adapted to different frameworks based on the specific requirements or constraints of the project at hand.

Further:

Throughout the implementation process, I experimented with both mindspore.nn.dropout and mindspore.ops.dropout to determine which was more beneficial for training. When I employed mindspore.nn.dropout, the training model tended to overfit, yielding a 100% accuracy rate for the training set. It's noteworthy that both variants randomly nullify certain elements of the input tensor with a 1−keep_prob probability during training, utilizing samples from a Bernoulli distribution. However, a key distinction lies in the fact that nn.dropout scales the output by a factor of 1/keep_prob during training, which may explain the observed difference in performance. the result is shown below when using nn.dropout.