【论文笔记】Representation Learning on Graphs: Methods and Applications

最新推荐文章于 2022-10-31 11:29:31 发布

zjwreal

最新推荐文章于 2022-10-31 11:29:31 发布

阅读量3k

点赞数 5

分类专栏：网络表示学习网络表示学习文章标签： Network Embedding Survey Graph Embedding

本文链接：https://blog.csdn.net/zjwreal/article/details/87463214

版权

Hamilton W L, Ying R, Leskovec J. Representation learning on graphs: Methods and applications[J]. arXiv preprint arXiv:1709.05584, 2017.

该论文是斯坦福大学的Jure组的博士生出的关于图表示学习的综述，系统的介绍了图表示学习领域目前的发展现状。

目标

	图表示学习
目标	将网络信息转化为低维稠密的实数向量，并用于已有的机器学习算法的输入；用低维连续特征表示原有的高维离散特征
为什么需要	（1）数据高度稀疏（one-hot 编码/ 邻接矩阵）并且 $\times N$ 高维度（2）节点之间的相似性难以度量
应用场景	节点分类、链接预测、社区发现、推荐系统
传统方法	采用人工选取的图的总结特征，如节点度数。缺点：人工选取的特征普适性差，并且耗时
表示的含义	（1）节点的全局位置（相邻节点具有相似的表示）（2）节点的角色

基于高阶关系的表示学习：把网络顶点关系的相似性从一阶扩展到高阶。对各阶关系采用不一样的目标函数，然后将各阶关系获取的分布式表示进行拼接，进而获得顶点表示。

Node embedding(most common)

Edge embedding

Relations in knowledge graph
Link prediction

Sub-graph embedding

Substructure embedding
Community embedding

Whole-graph embedding

2 节点嵌入 Embedding Nodes

embedding global position	细分类型	代表工作
Shallow embedding	Laplacian eigenmaps:	GF 、HOPE
	Random walk based	DeepWalk、node2vec，LINE
Deep Learning based	auto-encoder	SDNE、DNGR
GCN / neighborhood aggregation		GCN，GraphSAGE

embedding structural roles	细分类型	代表工作
	随机游走	struc2vec，RolX
	谱图理论	GraphWave

Transductive learning vs Inductive learning

Transductive learning: unlabelled data is the testing data

Inductive learning: unlabelled data is not the testing data

在训练过程中，已知testing data（unlabelled data）是transductive learing

在训练过程中，并不知道testing data ，训练好模型后去解决未知的testing data 是inductive learing

简单来说，transductive和inductive的区别在于我们想要预测的样本，是不是我们在训练的时候已经见（用）过的。

2.1 编码-解码视角 encoder-decoder perspective

2.2 浅层模型 Shallow embedding approaches

大多数embedding方法属于shallow embedding方法即浅层模型 (e.g. node2vec, DeepWalk, Laplcacian Eigenmaps)，将节点映射到embedding向量的计算过程类似于查找
${\rm ENC}(\mathbf{v}_i)=\mathbf{Zv}_i$
$\mathbf{v}_i \in \R^{|V|\times 1}$ 是一个one-hot向量，表示 $Z$ 中节点 $v_i$ 对应的列， $\mathbf{Z} \in \R^{d \times|V|}$ 是一个包含所有节点embedding向量的矩阵。shallow embedding方法直接训练矩阵 $Z$ 。其decoder是计算节点pair-wise相似性。

Encoder-Decoder框架下现有的shallow embedding方法的缺点：

向量化后的节点之间没有参数共享，完全是一种记忆化的模型存储和查询方式（Look-up），这对存储和计算都构成了不小的挑战。由于节点之间没有参数共享，也就大大损失了泛化能力。
目前大部分向量化方法，仅利用网络结构信息，并没有利用网络节点本身的属性（比如文本、图像和统计特征），使得结果向量对网络信息的存储很有限。
大部分模型是对静态网络结构的直推学习，并没有考虑网络时间演化过程中新节点的生成和旧节点的湮灭，不能直接生成训练中未存在节点的embedding。而网络的动态特性对理解其性质也至关重要。这个弱点甚至会影响向量化在动态网络上的效果。

2.2.1 基于矩阵分解 Factorization based

矩阵分解是传统的节点向量化方法，其思想就是对网络的邻接矩阵进行降维，给每个节点生成一个低维表示。

相似性度量：deterministic node similarity measure 确定性

Laplacian Eigenmaps

${\rm DEC} (\mathbf{z}_i, \mathbf{z}_j) = || \mathbf{z}_i - \mathbf{z}_j||_{2}^{2}$

最低0.47元/天解锁文章

zjwreal

关注

5
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
【论文笔记】Representation Learning on Graphs: Methods and Applications

Hamilton W L, Ying R, Leskovec J. Representation learning on graphs: Methods and applications[J]. arXiv preprint arXiv:1709.05584, 2017.该论文是斯坦福大学的Jure组的博士生出的关于图表示学习的综述，系统的介绍了图表示学习领域目前的发展现状。目标图...
复制链接

扫一扫