Task 03 基于网络的节点表征学习

基于网络的节点表征学习

1.表示学习
  • 机器学习领域:

    • 图像—>向量/视频—>向量….—>所有的深度学习模型都可以归结为表示学习的问题
    • 挑战:如何利用我们在图像/视频上取得的成功经验来应对图特征的表示学习
  • 卷积神经网络:表示学习利器。

    image-20200320114415003

  • 从图的角度看图像上的CNN:在欧式空间上的格点图

    • 尺度不变性
    • 多尺度结构
  • 为什么可以在图像上做卷积?

    • 图像局部结构相同
    • 卷积核与局部结构相同
    • 基于空间位置(spatial)的卷积(与spatial对应的是Spectral 谱)
  • 目标:将在欧式空间上的CNN扩展到拓扑空间。—>图卷积

2. 图卷积神经网络
  • 卷积神经网络(GCN):

    • 输入:邻接矩阵(节点数x节点数),特征矩阵(节点数x输入特征数)

    • 输出:新的特征矩阵(节点数x输出特征数)

    • 多层网络可以进行叠加

    • 节点层面节点自身特征和其邻域特征的聚合

      H ( l + 1 ) = σ ( D ~ − 1 2 A ~ D ~ − 1 2 H ( l ) W ( l ) ) H^{(l+1)}=\sigma\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(l)} W^{(l)}\right) H(l+1)=σ(D~21A~D~21H(l)W(l))

      其中: A ~ = A + I N \tilde{A}=A+I_{N} A~=A+IN:带自环(自身特征与邻居节点特征进行Aggregation)的邻接矩阵

      D ~ = ∑ j A ~ i j \tilde{D}=\sum_{j}\tilde{A}_{ij} D~=jA~ij:度矩阵

      H:节点向量即 特征矩阵(Feature Matrix) W:模型参数(Weight Matrix)

      AxHxW 相乘 相当于 卷积操作

      σ ( . ) \sigma(.) σ(.):激活函数

    • 对GCN公式的直观理解

      image-20200321112148769

    • 两层GCN构造&损失函数

      Z = f ( X , A ) = softmax ⁡ ( A ^ ReLU ⁡ ( A ^ X W ( 0 ) ) W ( 1 ) ) Z=f(X, A)=\operatorname{softmax}\left(\hat{A} \operatorname{ReLU}\left(\hat{A} X W^{(0)}\right) W^{(1)}\right) Z=f(X,A)=softmax(A^ReLU(A^XW(0))W(1)) A ^ = D ~ − 1 2 A ~ D ~ − 1 2 \hat{A}=\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} A^=D~21A~D~21

      L = − ∑ l ∈ Y L ∑ f = 1 F Y l f ln ⁡ Z l f \mathcal{L}=-\sum_{l \in \mathcal{Y}_{L}} \sum_{f=1}^{F} Y_{l f} \ln Z_{l f} L=lYLf=1FYlflnZlf

      image-20200320125935849

    • GCN推导思路在图的拓扑空间近似在谱空间中的滤波操作,减少可学习参数。

      image-20200320141838251

    • 从另一个角度理解GCN:对邻居节点特征的(带权重的)聚合。

3. 图注意力网络 GAT(对权重进行扩展)
  • GCN中使用的邻接矩阵权重是提前给定的,例如 D ~ − 1 2 A ~ D ~ − 1 2 \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} D~21A~D~21

  • 图注意力网络引入自注意力机制,利用当前节点的特征以及其邻居及邻居节点的特征计算邻居节点的重要性,把该重要性作为新的邻居矩阵进行卷积计算。(增加了计算量)

  • 优势:利用节点特征的相似性更能反应邻接信息。

    image-20200320145532791

    α i j = exp ⁡ (  LeakyReLU  ( a → T [ W h ⃗ i ∥ W h ⃗ j ] ) ) ∑ k ∈ N i exp ⁡ (  LeakyReLU  ( a → T [ W h ⃗ i ∥ W h ⃗ k ] ) ) \alpha_{i j}=\frac{\exp \left(\text { LeakyReLU }\left(\overrightarrow{\mathbf{a}}^{T}\left[\mathbf{W} \vec{h}_{i} \| \mathbf{W} \vec{h}_{j}\right]\right)\right)}{\sum_{k \in \mathcal{N}_{i}} \exp \left(\text { LeakyReLU }\left(\overrightarrow{\mathbf{a}}^{T}\left[\mathbf{W} \vec{h}_{i} \| \mathbf{W} \vec{h}_{k}\right]\right)\right)} αij=kNiexp( LeakyReLU (a T[Wh iWh k]))exp( LeakyReLU (a T[Wh iWh j]))

    h ⃗ i ′ = σ ( 1 K ∑ k = 1 K ∑ j ∈ N i α i j k W k h ⃗ j ) \vec{h}_{i}^{\prime}=\sigma\left(\frac{1}{K} \sum_{k=1}^{K} \sum_{j \in \mathcal{N}_{i}} \alpha_{i j}^{k} \mathbf{W}^{k} \vec{h}_{j}\right) h i=σK1k=1KjNiαijkWkh j

4. PyG代码实践
  • PyG中的GCNConv模块说明

    构造函数接口:

    GCNConv(in_channels: int, out_channels: int, improved: bool = False, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, bias: bool = True, **kwargs)
    

    参数说明:

    • in_channels:输入数据维度;
    • out_channels:输出数据维度;
    • improved:如果为true A ^ = A + 2 I \mathbf{\hat{A}} = \mathbf{A} + 2\mathbf{I} A^=A+2I,其目的在于增强中心节点自身信息;
    • cached:是否存储 D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^1/2A^D^1/2的计算结果以便后续使用,这个参数只应在归纳学习(transductive learning)的景中设置为true
    • add_self_loops:是否在邻接矩阵中增加自环边;
    • normalize:是否添加自环边并在运行中计算对称归一化系数;
    • bias:是否包含偏置项。
  • 构建MLP,GCN,GAT模型(models.py)

    import torch
    import torch.nn.functional as F
    from torch.nn import Linear
    from torch_geometric.nn import GCNConv
    from torch_geometric.nn import GATConv
    
    class MLP(torch.nn.Module):
        def __init__(self, in_features_channles, hidden_channles, out_features_channles):
            super(MLP, self).__init__()
            torch.manual_seed(42)
            self.lin_one = Linear(in_features_channles, hidden_channles)
            self.lin_two = Linear(hidden_channles, out_features_channles)
    
        def forward(self, input, edge_index):
            x = self.lin_one(input)
            x = x.relu()
            x = F.dropout(x, p=0.5, training=self.training)
            output = self.lin_two(x)
            return output
    
    class GCN(torch.nn.Module):
        def __init__(self, in_features_channles, hidden_channles, out_features_channles):
            super(GCN, self).__init__()
            torch.manual_seed(42)
            self.conv1 = GCNConv(in_features_channles, hidden_channles)
            self.conv2 = GCNConv(hidden_channles, out_channels)
    
        def forward(self, input, edge_index):
            x = self.conv1(input, edge_index)
            x = x.relu()
            x = F.dropout(x, p=0.5, training=self.training)
            x = self.conv2(x, edge_index)
            return x
    
            
    class GAT(torch.nn.Module):
        def __init__(self, in_features_channles, hidden_channles, out_features_channles):
            super(GAT, self).__init__()
            torch.manual_seed(42)
            self.conv1 = GATConv(in_features_channles, hidden_channles)
            self.conv2 = GATConv(hidden_chanles,out_channels)
            
        def forward(self, input, edge_index):
            x = self.conv1(input, edge_index)
            x = x.relu()
            x = F.dropout(x, p=0.5, training=self.training)
    
  • 构建训练流程(train.py)

    import argparse
    import torch
    from models import MLP,GCN,GAT
    from torch_geometric.datasets import Planetoid
    from torch_geometric.transforms import NormalizeFeatures
    
    # 参数设置
    model_names = ['MLP', 'GCN', 'GAT']
    optimizer_names = ['sgd', 'adam']
    parser = argparse.ArgumentParser(description='Graph nerual network')
    parser.add_argument('--model', default= 'GCN', choices=model_names)
    parser.add_argument('--optimizer','-o', default='adam', choices=optimizer_names)
    parser.add_argument('--epochs', default=100, type=int, metavar='N')
    parser.add_argument('--lr', '--learning-rate', default=0.01, type=float, metavar='LR')
    parser.add_argument('--weight-decay', '--wd', default=5e-4, type=float, metavar='W')
    # 下载数据集
    dataset = Planetoid(root = "./dataset", name = 'Cora', transform=NormalizeFeatures())
    
    def main():
        # set dataset
        data = dataset[0]
        node_features = data.x.cuda()
        node_label = data.y.cuda()
        edge_index = data.edge_index.cuda()
        in_features_channles = dataset.num_features
        hidden_channles=16
        out_features_channles=dataset.num_classes
        train_index = data.train_mask
        test_index = data.test_mask
        # initialize model
        args = parser.parse_args()
        if args.model =='MLP':
            model = MLP(in_features_channles, hidden_channles, out_features_channles)
        elif args.model =='GCN':
            model = GCN(in_features_channles, hidden_channles, out_features_channles)
        elif args.model =='GAT':
            model = GAT(in_features_channles, hidden_channles, out_features_channles)
        else:
            raise ValueError('Unsupported or unkown architecture')
        # set random seeds
        torch.manual_seed(42)
        torch.cuda.manual_seed_all(0)
        # move model to GPU
        model.cuda()
        # define loss function
        criterion = torch.nn.CrossEntropyLoss().cuda()
    
        # define optimizer
        if args.optimizer == 'adam':
            optimizer = torch.optim.Adam(model.parameters(), lr = args.lr, weight_decay = args.weight_decay)
        elif args.optimizer == 'sgd':
            optimizer = torch.optim.SGD(model.parameters(), lr = args.lr, weight_decay = args.weight_decay)
        for epoch in range(1,args.epochs+1):
            loss = train(node_features, node_label, edge_index, model, train_index, criterion, optimizer)
            if epoch % 10 == 0:
                print(f'Epoch:{epoch:03d},loss:{loss:.4f}')
        test_acc = test(node_features, node_label, edge_index, model, test_index)
        print(f'Test Acc:{test_acc:.4f}')
    
    def train(node_features, node_label, edge_index, model, train_index, criterion, optimizer):
        # strat training
        model.train()
        # zero out gradients
        optimizer.zero_grad()
        # compute output vector 
        out = model.forward(node_features, edge_index)
        # compute loss
        loss = criterion(out[train_index], node_label[train_index])
        loss.backward()
        optimizer.step()
        return loss
    
    def test(node_features, node_label, edge_index, model, test_index):
        # strat testing
        model.eval()
        # compute output vector
        out = model(node_features, edge_index)
        # use the class with highest probability
        pred = out.argmax(dim=1)
        test_correct = pred[test_index]== node_label[test_index]
        test_acc = int(test_correct.sum()) / int(test_index.sum())
        return test_acc
    
    if __name__ == '__main__':
        main()
    
5. 结果分析
  • 定量结果如下:

    # MLP
    Epoch:010,loss:1.8935
    Epoch:020,loss:1.7579
    Epoch:030,loss:1.5602
    Epoch:040,loss:1.3132
    Epoch:050,loss:1.0011
    Epoch:060,loss:0.8857
    Epoch:070,loss:0.7286
    Epoch:080,loss:0.6340
    Epoch:090,loss:0.5930
    Epoch:100,loss:0.5329
    Test Acc:0.5560
    # GCN
    Epoch:010,loss:1.8752
    Epoch:020,loss:1.7340
    Epoch:030,loss:1.5649
    Epoch:040,loss:1.3471
    Epoch:050,loss:1.1623
    Epoch:060,loss:0.9756
    Epoch:070,loss:0.8021
    Epoch:080,loss:0.7483
    Epoch:090,loss:0.6294
    Epoch:100,loss:0.5731
    Test Acc:0.8090
    # GAT
    Epoch:010,loss:1.8769
    Epoch:020,loss:1.7326
    Epoch:030,loss:1.5355
    Epoch:040,loss:1.2970
    Epoch:050,loss:0.9762
    Epoch:060,loss:0.8291
    Epoch:070,loss:0.6570
    Epoch:080,loss:0.5441
    Epoch:090,loss:0.5030
    Epoch:100,loss:0.4347
    Test Acc:0.7870
    
  • 可视化节点表征分布:

    为了实现节点表征分布的可视化,我们先利用TSNE将高维节点表征嵌入到二维平面空间,然后在二维平面空间画出节点。

    import matplotlib.pyplot as plt
    from sklearn.manifold import TSNE
    
    def visualize(h, color):
        z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())
        plt.figure(figsize=(10,10))
        plt.xticks([])
        plt.yticks([])
    
        plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
        plt.show()
    
  • 可视化未经过训练的节点表征 vs 经MLP训练后的节点表征分布

    image-20210623200041405

  • 可视化未经过训练的节点表征 vs 经GCN训练后的节点表征分布

    image-20210623200201195

  • 可视化未经过训练的节点表征 vs 经GCN训练后的节点表征分布
    image-20210623200345422

参考学习链接:
datawhale 6月组队学习 图神经网络

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值