paper: Heterogeneous Graph Attention Network | The World Wide Web Conference
目录
ideas: challenge: hetero graph -> homo graph lose much semantics and structural info.
4.3 Analysis of the proposed model
5.1.3 problem 2:train acc有变化,val acc值为什么一直不变
5.1.4 y_train, y_val, y_test都是(3025,3)
GNN, a powerful graph representation technique
problem: it not beeen fully considered in graph neural network for heterogeneous graph which contains different types of nodes and links.
- heterogeneity
- rich semantic information
solution: HAN(Heterogeneous graph attention network)
- node-level attention: learn the importance between a node and its meta-path based neighbors
- semantic-level attention: is able to learn the importance of different meta-paths.
-> model can generate node embedding by aggregating features from meta-path based neighbors in a hierarchical manner.
1. background
GAT: leverages attention mechanism for the homogeneous graph which includes only one type of nodes or links.
GAT only for homo-graph,也就是GAT只能选择一种meta-path做预测,但是你HAN不过是integrate多个GAT homo-graph outputs,没多大技术含量,真是个老6,服了你了!
- As a matter of fact, the real-world graph usually comes with multi-types of nodes and edges, also widely known as heterogeneous information network(HIN)
- meta-path, a composite relation connecting two objects, is a widely used structure to capture the semantics. e.g. meta-path Movie-Actor-Movie(MAM) -> 因为只有依据meta-path才能得到meta-path based neighbors.
- heterogeneous graph contains more comprehensive information and rich semantics. depending on the meta-paths, the relation between nodes in the heterogeneous graph can have different semantics.
(1) heterogeneity of graph
HINs, describe real-world graph.
different meta-path, have different semantics. while gnn can't be applied to hete-graph directly.
-> how to handle and preserve the diverse feature information simultaneously?
(2) semantic-level attention
background: Different meta-paths in the heterogeneous graph may extract diverse semantic information.
problem: how to select the most meaningful meta-paths and fuse the semantic information for the specific task.
solution: semantic-level attention aims to learn the importance of each meta-path and assign proper weights to them.
problem: treating different meta-path equally is unpractical and will weaken the semantic information <- some useful meta-paths
(3) Node-level attention
how to distinguish subtle difference of these neighbors and select some informative neighbors is required.
solution: node-level attention aims to learn the importance of meta-path based neighbors and assign different attention values to them.
e.g. The Terminator movie <-> meta-path relation
problem: how to design a model which can discover the subtle differences of neighbors and learn their weights properly will be desired.
(4) HAN
solution: HAN
- node-level attention: learn meta-path based neighbors' attention values
- semantic-level attention: learn meta-paths' attention values
-> our model can get the optimal combination of neighbors and multiple meta-paths in a hierarchical manner, which enables the learned node embeddings to better capture the complex structure and rich semantic information in a heterogeneous graph.
contributions
- first attempt to study the heterogeneous gnn based on attention mechanism. -> GNN directly applied to heterogeneous graph.
- HAN
- superiority -> good interpretability for heterogeneous graph analysis.
看来单纯的HAN不行啊,你做到的、没做到的,人家Fin-Event都做的比你好,这该咋办?
期待Re-HAN.
2. Related Work
2.1 GNN
- GCN, spectral approach, which design a graph convolutional network via a localized first-order approximation of spectral graph convolutions(graph Fourier).
- GraphSAGE, non-spectral approach, which performs a neural network based aggregator over a fixed size node neighbor. generates embeddings by aggregating features from a node's local neighborhood.
- GAT, proposed to learn the importance between nodes and its neighbors and fuse the neighbors to perform node classification.
2.2 Network Embedding
Network Representation Learning(NRI) -> is proposed to embed network into a low dimensional space while preserving the network structure and property so that the learned embeddings can be applied to the downstream network tasks.
Heterogeneous graph embedding mainly focuses on preserving the meta-path based structural information.
Aim-1: needs to conduct grid search to find the optimal weights of meta-paths.
3. Preliminary
background
- different meta-paths always reveal different semantics.
- Given a meta-path φ,there exists a set of meta-path based neighbors of each node which can reveal diverse structural and rich semantics in a heterogeneous graph.
- graph neural network has been proposed to deal with arbitrary graph-structure data. however, all of them are designed for homogeneous network.
4. Proposed Model
semi-supervised gnn for heterogenous graph.
(1) nodel-level attention -> learn the weight of meta-path based neighbors and aggregate them to get the semantic-spicific node embedding.
for node i, 同一meta-path(即semantics)下,求 neighbors weight.
(2) semantic-level attention -> can tell the difference for meta-paths and get the optimal weighted combination of the semantic-specific node embedding.
for node i, different meta-path 的 weight
4.1 Node-level attention
这不就是一个multi-head attention嘛,对每个homo-graph做embedding -> 缺少neighbor sampleing过程,不如人家Fin-Event做的细致吧!
problem: due to the heterogeneity of nodes, different types of nodes have different feature spaces.
solution: design type-specific transformation matrix to project the features of different types of nodes into the same feature space.
<- type-specific transformation matrix is based on node-type rather than edge-type.
- asymmetric不对称, node-level attention can preserve the asymmetric which is a critical property of heterogeneous graph.
ideas: challenge: hetero graph -> homo graph lose much semantics and structural info.
- problem-1: fail to learn the meta-path importance well.
- problem-2: heterogeneous element can't be applied directly with gnn, but converting into homogeneous graph first.
attention weight is generated for single meta-path, it is semantic-specific and able to capture one kind of semantic information.
-> multi-head attention, repeat the node-level attention for k times and concatenate the learned embeddings as the semantic-specific embeddings.
4.2 Semantic-level attention
这不就是用个nn.Linear把多个homo-graph embeddings合并成一个嘛,Fin-Event做的也不咋的,直接cat拼接起来,够简单的
need to fuse multiple semantics which can be revealed by meta-paths.
ideas:semantics
这里的semantics只是nlp传统意义上很狭隘的语序概念,而在更广泛的语义概念上,包括形状、图片、音色、颜色等可以指明一个物体独特性的语义属性
4.3 Analysis of the proposed model
4.4 classification
problem: the variance of graph-structured data can be quite high.
solution: repeat the process for 10 times and report the average.
HAN -> designs for heterogeneous graph, captures the rich semantics successfully and show its superiority.
4.5 Analysis
- node-level attention, learn the attention values between nodes and its neighbors in a specific meta-path
- semantic-level attention, learn the attention values between diverse meta-paths.
- with node-level and semantic-level attention, the importance of node and meta-path can be fully considered.
4.6 HAN 与Fin-Event对比
- Fin-Event用intra_gnn学习合并node_neighbors, inter_gnn合并多个homo-graph embeddings -》final adj matrix
- HAN则是用multi-head attention合并node-neighbors,将多个homo-graph embeddings合并,学习到了meta-path importance。
- Fine-Event做了RL sample neighbors,而HAN没做。
5. HAN Implementation
5.1 tf2 HAN 代码吐槽
原文tensorflow实现: GitHub - Jhy1993/HAN: Heterogeneous Graph Neural Network
整图输入,分别生成PAP、PLP对应的graph embedding,然后将它们合并成一个comprehensive embedding。
- feat_in_list
- bias_in_list
- lbl_in, y_train, (1,3025,3)
- msk_in, y_train, (1,3025)
5.1.1 HAN mask
HAN mask就真的是bool list,不同于Fin-Event mask是index,它的长度是(3025,)
5.1.2 HAN train
HAN在所有数据上做预测prediction概率,然后用train_msk选出训练数据对应的tran_pred_probabilities
就HAN train这事就很离谱,怎么看怎么别扭!
因为pytorch requires_grad会自动追踪所有tensor计算,当所有计算完成后执行backward()和optimizer.step()更新参数,虽然表面上是只有训练数据参数了计算loss function,但在BP阶段,梯度反馈对应的是所有数据,无形之中减小了train梯度大小,尤其是在train 600条,test2125条这么大差距的情况下
我给改成mini-batch训练后,舒服多了!GNN_models/HAN/HAN_torch at master · yuyongsheng1990/GNN_models · GitHub
5.1.3 problem 2:train acc有变化,val acc值为什么一直不变
可能是参数没咋变化,导致prediction没咋变 -》不收敛!!!
5.1.4 y_train, y_val, y_test都是(3025,3)
5.1.5 train数据太少,test数据太多
train数据太少,test数据太多,这不就相当于transfer learning吗?不,这还不如transfer learning,transfer learning model起码是similar,参数经过training,HAN你这直接上个生瓜蛋子,近似raw_model去预测,效果肯定不好啊!
-》改进:train: 2125; test: 600
5.2 pytorch_HAN 代码分析
1) pytorch_HAN没有像tf_HAN中的3列式y_train,而是用了自己定义的my_label,长度从(3025,3)->(600,)
-》我将整图input 生成embedding改成了mini-batch
2) multi-head不是应该把x分成8份吗?HAN_tf2和pytorch_HAN都是将head复制了8次,x不变
3) embed_list,长度=同构图number,存储着每个同构图的embedding,(3025,1,64)
4) simple attention layer将多个同构图embedding合并为一个final hete-graph ebedding。
5) out,一个linear layer做预测-》(3025,3)