CS224W: Machine Learning with Graphs
Stanford / Winter 2021
01-Intro
-
Why Graph Machine Learning So Hard ?
-
Arbitrary size and complex topological structure (No spatial locality like grids)
-
No fixed node ordering or reference point
-
Often dynamic and have multimodal features
-
-
CS224W & Representation Learning
-
(Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time!
-
Map nodes to d-dimensional embeddings such that similar nodes in the network are embedded close together
-
-
Different Types of Tasks
-
Node level
-
Edge-level
-
Community (subgraph) level
-
Graph-level prediction, Graph generation
-
-
Classic Graph ML Tasks
-
Node classification: Predict a property of a node
- Categorize online users/items
-
Link prediction: Predict whether there are missing links between two nodes
- Knowledge graph completion
-
Graph classification: Categorize different graphs
-
Clustering: Detect if nodes from a community
- Social circle detection
-
Other tasks
-
Graph generation: Drug discovery
-
Graph evolution: Physical simulation
-
-
-
Compoments of a Network
-
Objects: nodes, vertices N \quad N N
-
Interations: links, edges E \quad E E
-
System: network, graph G ( N , E ) \quad G(N,E) G(N,E)
-
-
How do you define a graph ?
- The way you assign links will determine the nature of the question you can study
-
Directed vs. Undirected Graphs
-
Node Degrees
-
Bipartite Graph
二部图
-
A graph whose nodes can be divided into two disjoint sets U U U and V V V such that every link connects a node in U U U to one in V V V; that is, U U U and V V V are independent sets
-
Authors-to-Papers (they authored)
-
Actors-to-Movies (they appeared in)
-
-
Folded/Projected Bipartite Graphs
- Create a connection between a pair of nodes if they have at least one neighbor in common
-
Representing Graphs: Adjacency Matrix
-
Adjacency Matrices are Sparse
-
Networks are Sparse Graphs: Most real-world networks are sparse ( E < < E m a x E << E_{max} E<<Emax or k < < N − 1 k << N-1 k<<N−1)
-
-
More Types of Graphs
-
Unweighted and Weighted
-
Self-edges (self-loops) and Multigraph
-
-
Connectivity of Undirected Graphs
无向图的连通性
-
Connected (undirected) graph: Any two vertices can be joined by a path
-
Disconnected graph: Made up by two or more connected components
-
无向非连通图的邻接矩阵可以被写成分块对角矩阵的形式(block-diagonal form),因为零元素可以被限制在次对角的两个分块中
-
-
Connectivity of Directed Graphs
有向图连通性
-
Strongly connected directed graph: has a path from each node to every other node and vice versa (A-B and B-A)
-
Weakly connected directed graph: is (weakly) connected if we disregard the edge directions (e.g. its corresponding undirected graph is connected)
-
Strongly connected components (SCCs) can be identified, but not every node is part of a nontrivial strongly connected component
-
Supplement From Graph Representation Learning
-
Homogeneous and Heterogeneous Graph
同质图和异质图
-
Homogeneous Graph: 节点和边的类型(关系类型)只有一种
-
Heterogeneous Graph:节点和边的类型(关系类型)不止一种
-
-
Graph or Network ?
-
Graph: 抽象数据结构,更关注数学层面、抽象层面的理论性质
-
Network: 抽象数据结构的实例化,更关注真实数据本身的特征与性质
-
-
节点分类中,训练数据(图中的节点)打破了i.i.d.的假设,对一组相互连接的节点进行建模,而不是对一组符合i.i.d.的节点进行建模(但在Graph-level的任务中,训练数据是整张图,符合i.i.d.)
-
Link Prediction = Relational Inference