2021-09-08

最新推荐文章于 2025-02-25 10:35:04 发布

Mighty_Crane

最新推荐文章于 2025-02-25 10:35:04 发布

阅读量160

点赞数

分类专栏：小白论文文章标签： python 自然语言处理深度学习

本文链接：https://blog.csdn.net/weixin_40459958/article/details/120181621

版权

小白同时被 2 个专栏收录

71 篇文章

订阅专栏

论文

69 篇文章

订阅专栏

4500个正例（相似对），4500个负例
训练集6299对，测试集2701对
————————————————

Graph match

相似度分类
GNN框架

每轮跑完所有的batch，即共63个batch
对于输入的每个batch，有100对图，每个图189个节点，共189*2*100=37800个节点
有node_features [37800, 90]
edge_features   None
from_idx		99154
to_idx			99154
graph_idx		37800
labels			100

GraphEmbeddingNet(
  (_encoder): GraphEncoder(
    (MLP1): Sequential(
      (0): Linear(in_features=90, out_features=32, bias=True)
    )
#即[37800,90]x[90,32]->[37800,32]（1）
  )
  (_prop_layers): ModuleList(
#即[2,99154,32]的edge_inputs拼成[99154,64]的edge_inputs作为输入
    (0): GraphPropLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
#即[99154,64]x[64,64]=relu=>x[64,64]->[99154,64],
#再整合成[37800,64]送进反向信息传播网
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
#即当前输出是[37800,64]，矩阵加法大小不变（2），
#把（2）扩维出[1,37800,64]（3），把（1）扩维到[1,37800,32]（4）
#把（3）和（4）一起送进GRU
      (GRU): GRU(64, 32)
#即输出[1,37800,32]再缩维到[37800,32]
    )
#即每层输入[37800,32]    
    (1): GraphPropLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(64, 32)
    )
    (2): GraphPropLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(64, 32)
    )
    (3): GraphPropLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(64, 32)
    )
    (4): GraphPropLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(64, 32)
    )
  )
#即将5层传播结果与编码结果汇总得到[6,37800,32]的输出  
  (_aggregator): GraphAggregator(
    (MLP1): Sequential(
      (0): Linear(in_features=32, out_features=256, bias=True)
    )
#即[37800,32]x[32,256]->[37800,256]，
#再将其分为左右两个[37800,128]的矩阵做元素乘法得到[37800,128]，
#再整合成[200,128]
    (MLP2): Sequential(
      (0): Linear(in_features=128, out_features=128, bias=True)
    )
#即[200,128]x[128,128]->[200,128]   
  )
)
此时对于这200个图，每个图对应一个128维的向量，
横向切开得到一上一下两个[100,128]，分别是x和y，
即(x,y)对应一对图，对应一个label，对应一个loss，
即每个batch对应一个100维的loss

GMN框架

GraphMatchingNet(
  (_encoder): GraphEncoder(
    (MLP1): Sequential(
      (0): Linear(in_features=90, out_features=32, bias=True)
    )
  )
  (_prop_layers): ModuleList(
    (0): GraphPropMatchingLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
#即输出[37800,64]（2） 
     
#对[37800,32]的（1）划分成200块，每块189行，
#对每块中每相邻的x和y，先计算相似度[189,189]，
#再分别乘y和x得到一对注意力attention_x, attention_y [189,32]
#汇总成[200,189,32]再拼接成[37800,32]（5）
#（5）=（1）-（5）
#再将（2）和（5）拼成[37800,96]
      (GRU): GRU(96, 32)
    )
    (1): GraphPropMatchingLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(96, 32)
    )
    (2): GraphPropMatchingLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(96, 32)
    )
    (3): GraphPropMatchingLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(96, 32)
    )
    (4): GraphPropMatchingLayer(
      (_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (_reverse_message_net): Sequential(
        (0): Linear(in_features=64, out_features=64, bias=True)
        (1): ReLU()
        (2): Linear(in_features=64, out_features=64, bias=True)
      )
      (GRU): GRU(96, 32)
    )
  )
  (_aggregator): GraphAggregator(
    (MLP1): Sequential(
      (0): Linear(in_features=32, out_features=256, bias=True)
    )
    (MLP2): Sequential(
      (0): Linear(in_features=128, out_features=128, bias=True)
    )
  )
)

——————————————————————

GCN（有histogram）

相似度分类

每轮重新读一遍所有batch
每个batch有128对，即共50个batch
输入model一对图的[189,104]节点属性矩阵（由一维向量onehot得来）、边（以第一对为例是988和1002）、标签target
SimGNN(
#分别将这两个图通过各自的三层GCN
  (convolution_1): GCNConv(104, 128)
#即[189,104]x[104,128]->[189,128] 
#988=自环=>1177=聚合传播=>1177
  (relu)
  (dropout)
  (convolution_2): GCNConv(128, 64)
  (relu)
  (dropout)
  (convolution_3): GCNConv(64, 32)
#即输出[189,32]  
  
  (histogram)
#即将两个[189,32]相乘得[189,189]后拉直得到[35721,1]用来计算出[1,16]的hist

#分别将两个[189,32]通过各自的attention
  (attention): AttentionModule()
#即[189,32]x[32,32]->[189,32]=mean=>32=tanh=>32=>[32,1]
#[189,32]x[32,1]=sigmoid=>[189,1]
#[32,189]x[189,1]->[32,1]

#将两个[32,1]一起送入张量层
  (tensor_network): TenorNetworkModule()
#即先把原本[32,32,16]的权重view成[32,512]
#[1,32]x[32,512]->[1,512]=view=>[32,16]
#[16,32]x[32,1]->[16,1]（1）
#将原本的两个[32,1]cat成[64,1]
#[16,64]x[64,1]->[16,1]（2）+[16,1]（1）+[16,1]（bias）=relu=>[16,1]
#再转置后与hist拼接成[1,32]

  (fully_connected_first): Linear(in_features=32, out_features=16, bias=True)
  (relu)
#即[1,32]x[32,16]->[1,16]（如果没有histogram则权重为[16,16]）
  (scoring_layer): Linear(in_features=16, out_features=1, bias=True)
  (sigmoid)
#[1,16]x[16,1]->[1,1]
)
对batch内每一对图计算一个loss
对每个batch计算一个loss的总和用来BP

————————————

graph bert

类别分类

共188个图，训练169个，测试19个
输入

x                [169,28]      #  e^x 
说是节点属性，其实对于每一个graph，就是在nx中保存的节点索引列表，不足的部分用最大节点数+1来补上
例如
[0, 1, 13, 2, 3, 11, 4, 5, 6, 10, 7, 20, 8, 9, 15, 12, 14, 19, 16, 17, 18, 21, 22]
↓
[0, 1, 13, 2, 3, 11, 4, 5, 6, 10, 7, 20, 8, 9, 15, 12, 14, 19, 16, 17, 18, 21, 22, 29, 29, 29, 29, 29]

d                [169,28]      #  e^d 节点度  

w                [169,28,28]   #  e^w 边的权值
可以看成邻接矩阵

wl 				 [169,28]      #  e^r WL      
node_color_dict中以节点序号为索引填入1，也即是表示有哪些节点的字典
例如{0: 1, 1: 1, 13: 1, 2: 1, 3: 1, 11: 1, 4: 1, 5: 1, 6: 1, 10: 1, 7: 1, 20: 1, 8: 1, 9: 1, 15: 1, 12: 1, 14: 1, 19: 1, 16: 1, 17: 1, 18: 1, 21: 1, 22: 1}
node_neighbor_dict遍历2跳邻居，也是记录有哪些节点，有点类似邻接链表
例如{0: {1: 1, 13: 1}, 1: {0: 1, 2: 1}, 13: {0: 1, 12: 1}, 2: {1: 1, 3: 1, 11: 1}, 3: {2: 1, 4: 1}, 11: {2: 1, 10: 1, 12: 1}, 4: {3: 1, 5: 1}, 5: {4: 1, 6: 1, 10: 1}, 6: {5: 1, 7: 1, 20: 1}, 10: {11: 1, 5: 1, 9: 1}, 7: {6: 1, 8: 1}, 20: {6: 1, 21: 1, 22: 1}, 8: {7: 1, 9: 1}, 9: {10: 1, 8: 1, 15: 1}, 15: {9: 1, 14: 1, 16: 1}, 12: {13: 1, 11: 1, 14: 1}, 14: {15: 1, 12: 1, 19: 1}, 19: {14: 1, 18: 1}, 16: {15: 1, 17: 1}, 17: {16: 1, 18: 1}, 18: {19: 1, 17: 1}, 21: {20: 1}, 22: {20: 1}}
取节点13的邻居字典{0: 1, 12: 1}，
再取出value[1, 1]
再在其最前面加上node_color_dict[13],其实就是再加上一个1，得到['1', '1', '1']
进而拼出'1_1_1'
再通过hashlib.md5编码
汇总所有节点的编码后再得到去重字典
例如{'4eb90ba61276b0e27cee6f190e612949': 1, '9e8973112eebad7f27f0b762abd14d1e': 2, 'ec308451c1d095c528cfa3c009ea7235': 3}
再根据这个字典把编码映射成相应value
例如{13: 1, 0: 1, 1: 1, 2: 2, 3: 1, 11: 2, 4: 1, 5: 2, 6: 2, 10: 2, 7: 1, 20: 2, 8: 1, 9: 2, 15: 2, 12: 2, 14: 2, 19: 1, 16: 1, 17: 1, 18: 1, 21: 3, 22: 3}
经过多次迭代更新上面的字典
最后去除字典的value就是这个图的WL[1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3]



y_true 			 169
context_idx_list 0



MethodGraphBertGraphClassification(
#先通过一层‘none’残差处理w其实返回两个none
#如果是‘raw’残差处理w，则经过下面两个linear，784=28*28
  (res_h): Linear(in_features=784, out_features=32, bias=True)
  (res_y): Linear(in_features=784, out_features=2, bias=True)
#分别得到residual_h和residual_y
#residual_h加给每个BertLayer的输出
#residual_y加给cls_y的结果
  
  (bert): MethodGraphBert(
    (embeddings): BertEmbeddings(
      (raw_feature_embeddings): Linear(in_features=28, out_features=32, bias=True)
#即对w处理，[169,28,28]x[28,32]->[169,28,32]
      (tag_embeddings): Embedding(1000, 32)
#即对x处理，[169,28]=embed=>[169,28,32]      
      (degree_embeddings): Embedding(1000, 32)
#即对d处理，[169,28]=embed=>[169,28,32]     
      (wl_embeddings): Embedding(1000, 32)
#即对wl处理，[169,28]=embed=>[169,28,32]
#四个再求和
      (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.5, inplace=False)
#输出（1）
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=32, out_features=32, bias=True)
              (key): Linear(in_features=32, out_features=32, bias=True)
              (value): Linear(in_features=32, out_features=32, bias=True)
#即对（1）分别做三个[169,28,32]x[32,32]->[169,28,32]得到Q,K,V
#再分别把Q,K,V view成[169,28,2,16],
#再permute成[169,2,28,16]
#计算QK^T,[169,2,28,16]x[169,2,16,28]->[169,2,28,28]
#经过除√16标准化，再softmax
              (dropout): Dropout(p=0.3, inplace=False)
#dropout后再乘V,[169,2,28,28]x[169,2,28,16]->[169,2,28,16]
#再permute成[169,28,2,16]再view成[169,28,32]输出（2）
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=32, out_features=32, bias=True)
#即对（2）[169,28,32]x[32,32]->[169,28,32]（3）
              (dropout): Dropout(p=0.5, inplace=False)
#对（3）dropout
              (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
#将（1）和（3）求和后做layernorm得到（4）
#再将（4）与（2）求和后输出（5）
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=32, out_features=32, bias=True)
#即对（5）[169,28,32]x[32,32]->[169,28,32]，
#再通过一个gelu函数得到（6）
          )
          (output): BertOutput(
            (dense): Linear(in_features=32, out_features=32, bias=True)
#即对（6）[169,28,32]x[32,32]->[169,28,32]（7）
            (dropout): Dropout(p=0.5, inplace=False)
            (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
#对（7）dropout后和（5）求和再做layernorm输出（8）
          )
        )
        (1): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=32, out_features=32, bias=True)
              (key): Linear(in_features=32, out_features=32, bias=True)
              (value): Linear(in_features=32, out_features=32, bias=True)
              (dropout): Dropout(p=0.3, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=32, out_features=32, bias=True)
              (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.5, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=32, out_features=32, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=32, out_features=32, bias=True)
            (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.5, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
#将上面的输出（9）[169,28,32]取其[169,32]
      (dense): Linear(in_features=32, out_features=32, bias=True)
#[169,32]x[32,32]->[169,32]
      (activation): Tanh()
    )
#将上面tanh后输出的（10）与（9）一同输出，其实（10）没啥用啊？
  )
#对于（9）[169,28,32]对齐第二个维度求均值，即28个[169,32]矩阵求均值得到（11）
  (cls_y): Linear(in_features=32, out_features=2, bias=True)
#对（11）[169,32]x[32,2]->[169,2]后再log_softmax
)
输出结果是[169,2]的矩阵，标签是169的向量，
这时计算损失采用F.cross_entropy，其中nll_loss可以把[169,2]均值成169的向量再与标签计算交叉熵得到一个值
即一个图对应一个loss值

————————————————————
RWGNN

分折，共188个图，测试94，训练84，验证10
每个图节点数不同，节点维度都是7维

graph_indicator_batch对应当前有n个节点的这个图，
记录j-i（i是当前batch的序号，j=min(i+64,84)），
即第0 batch对应0,1,2,...,64，第1 batch对应0,1,...,64
其中每个数字的个数对应当前图节点个数
作用是在总的节点矩阵中标注某个节点属于的图的索引



adj_train				分批[1155,1155]	[1161,1161]	[381,381]
features_train			分批[1155,7]	[1161,7]	[381,7]
graph_indicator_train	分批1155		1161		381
y_train					分批64			64			24		
即以第一批为例，输入64个图，共有1155个节点
RW_NN(
#先把graph_indicator_train去重,得到图索引unique和每个图的节点数counts

#取学习参数[hidden_graphs,(size_hidden_graphs*(size_hidden_graphs-1))//2]记作adj_train[16,C5^2]
#构造[hidden_graphs,size_hidden_graphs,size_hidden_graphs]的adj_hidden_norm[16,5,5]
  (Relu)
#将每个[5,5]的矩阵上三角填入relu(adj_train),再与其转置相加

  (fc): Linear(in_features=7, out_features=4, bias=True)
#即[1155,7]x[7,4]->[1155,4]
  (sigmoid): Sigmoid()
#激活后记作x
#取学习参数[hidden_graphs,size_hidden_graphs,hidden_dim]记作z[16,5,4],
#通过torch.einsum("abc,dc->abd",(z,x))乘法,得到zx即[16,5,4]x[1155,4]->[16,5,1155]
  
#接下来2轮循环

#第一轮
#先构造16个5维单位矩阵,即eye[16,5,5]
#通过torch.einsum("abc,acd->abd",(eye,z))乘法,得到o即[16,5,5]x[16,5,4]->[16,5,4]
#通过torch.einsum("abc,dc->abd",(o,x))乘法,得到t即[16,5,4]x[1155,4]->[16,5,1155]
  (dropout)
#t=zx○t即[16,5,1155]○[16,5,1155]->[16,5,1155]元素乘法
#先根据t和图总数构造零矩阵temp[16,5,64]
#再通过temp.index_add_(2,graph_indicator,t)加法,相当于
#	for i in range(temp.shape[2]):				#64
#		for j in range(len(graph_indicator)):	#1155
#			if graph_indicator[j]==i:
#				temp[i]+=t[j]
#再对temp求和sum(t, dim=1)并转置得到t[64,16]

#第二轮
#x=adj_train*x即[1155,1155]x[1155,4]->[1155,4]
#通过torch.einsum("abc,acd->abd",(adj_hidden_norm,z))乘法,得到z即[16,5,5]x[16,5,4]->[16,5,4]
#通过torch.einsum("abc,dc->abd",(z,x))乘法,得到t即[16,5,4]x[1155,4]->[16,5,1155]
  (dropout)
#t=zx○t即[16,5,1155]○[16,5,1155]->[16,5,1155]元素乘法
#先根据t和图总数构造零矩阵temp[16,5,64]
#再通过temp.index_add_(2,graph_indicator,t)加法
#再对temp求和sum(t, dim=1)并转置得到t[64,16]

#将两轮的t拼接得到[64,32]
  (bn): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc1): Linear(in_features=32, out_features=32, bias=True)
#即[64,32]x[32,32]->[64,32]
  (relu): ReLU()
  (dropout): Dropout(p=0.2, inplace=False)
  (fc2): Linear(in_features=32, out_features=2, bias=True)
#即[64,32]x[32,2]->[64,2]
#log_softmax
)
每个图对应一个2维向量，其实还是后继二分类交叉熵，即每一batch对应一个loss