CS224W: Machine Learning with Graphs - 06 Graph Neural Networks (GNN) 1: GNN Model

最新推荐文章于 2024-08-16 23:31:18 发布

xbfu-xjtu

最新推荐文章于 2024-08-16 23:31:18 发布

阅读量213

点赞数

文章标签：机器学习人工智能图论

本文链接：https://blog.csdn.net/fxb163/article/details/120962191

版权

GNN Model

0. Limitations of shallow embedding methods

$O (∣ V ∣)$ parameters are needed: no sharing of parameters between nodes so every node has its own unique embedding
Inherently “transductive”: cannot generate embeddings for nodes not seen during training
Do not incorporate node features: features should be leveraged

1. Deep Graph Encoders

0). Deep Methods based on GNN

$E N C (v) =$ multiple layers of non-linear transformations based on graph structure
Note: all deep encodes can be combined with node similarity functions

1). Modern ML Toolbox

Modern deep learning toolbox is designed for simple sequences and grids. But networks are far more complex

Arbitrary size and complex topological structure (i.e., no spatial locality like grids)
No fixed node ordering or reference point
Often dynamic and have multimodal features

2. Basics of Deep Learning

To be updated

3. Deep Learning for Graphs

1). A Naive Approach

Join adjacency matrix and features then feed them into a deep neural network
Issues:

$O (∣ V ∣)$ parameters
Not applicable to graph of different sizes
Sensitive to node ording

2). Convolutional Networks

a). From images to graphs

Goal: generalize convolutions beyond simple lattices and leverage node features/ attributes
Problem:

There is no fixed notion of locality or sliding window on the graph
Graph is permutation invariant

Idea: transform information at the neighbors and combine it:

Transform “message” $h_i$ from neighbors: $W_ih_i$
Add them up: $\sum_i W_ih_i$

b). Graph convolutional networks

Idea: node’s neighborhood defines a computation graph (determine node computation graph; propagate and transform information)
Basic approach: average information from neighbors and apply a neural network
$h_v^0=x_v$
$h_v^{l+1}=\sigma(W_l\sum_{u\in N(v)} \dfrac{h_u^l}{|N(v)|}+B_lh_v^l), \forall l\in \{0,...,L-1\}$
$z_v=h_v^L$
where

$h_v^l$ : hidden representation of node $v$ at layer $l$
$W_l$ : weight matrix for neighborhood aggregation
$B_l$ : weight matrix for transforming hidden vector of self

c). Matrix formulation

Many aggregations can be performed efficiently by (sparse) matrix operations
Let $H^l=[h_1^l \cdots h_{|V|}^l]^T$ , then $\sum_{u\in N(v)}h_u^l=A_vH^l$
Let $D$ be diagonal matrix where $D_{vv} = Deg(v)=|N(v)|$ then $D_{vv}^{-1} = 1/|N(v)|$
Rewriting update function in matrix form
$H^{l+1}=\sigma (\tilde AH^lW_l^T+H^lB_l^T)$
where $\tilde A=D^{-1}A$
This implies that efficient sparse matrix multiplication can be used ( $\tilde A$ is sparse)

d). How to train a GNN

Node embedding $z_v$ is a function of input graph
Supervised setting: minimize the loss $L$
$\min_\theta L(y, f(z_v))$
Example: node classification
$L=-\sum_{v\in V}y_v\log(\sigma(z_v^T)+(1-y_v)\log(1-\sigma(z_v^T))$
Unsupervised setting: No node label available so use the graph structure as the supervision.
Similar nodes have similar embeddings
$L=\sum_{z_u,z_v}\text{CrossEntropy}(y_{uv}, \text{DEC}(z_u,z_v))$
where $y_{uv}=1$ when node $u$ and $v$ are similar and DEC is the decoder (e.g., inner product)
Node similarity can be anything such as random walks (node2vec, DeepWalk, struc2vec), matrix factorization, and node proximity in the graph

e). Model design: overview

Define a neighborhood aggregation function
Define a loss function on the embeddings
Train a set of nodes
Generate embeddings for nodes as needed

f). Inductive Capability

The same aggregation parameters ( $W_l$ and $B_l$ ) are shared for all nodes: the number of model parameters is sublinear in $∣ V ∣$ and we can generalize to unseen nodes (new graphs or new nodes).

xbfu-xjtu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS224W: Machine Learning with Graphs - 06 Graph Neural Networks (GNN) 1: GNN Model

CS224W: Machine Learning with Graphs - 06 Graph Neural Networks (GNN)
复制链接

扫一扫