CS224W: Machine Learning with Graphs
Stanford / Winter 2021
14-traditional-generation
Properties of Real-world Graphs
Properties of Real-world Graphs(第二章的回顾)
Degree Distribution
Degree Distribution
-
Degree distribution P ( k ) P(k) P(k): Probability that a randomly chosen node has degree k k k
P ( k ) = N k / N P(k)=N_{k} / N P(k)=Nk/N
N k N_k Nk: the number of nodes with degree k k k
Clustering Coefficient
Clustering Coefficient
-
For one node
C i = 2 e i k i ( k i − 1 ) C_{i}=\frac{2 e_{i}}{k_{i}\left(k_{i}-1\right)} Ci=ki(ki−1)2ei
e i e_i ei is the number of edges between the neighbors of node i i i; k i k_i ki: the degree of node i i i -
Graph clustering coefficient
C = 1 N ∑ i N C i C=\frac{1}{N} \sum_{i}^{N} C_{i} C=N1i∑NCi
Connectivity
Connectivity
-
Size of the largest connected component
-
Largest component = Giant component
Path Length
Path Length
-
Diameter: The maximum (shortest path) distance between any pair of nodes in a graph (任意两节点间的最大的最短路径)
-
Average path length for a connected graph or a strongly connected directed graph
h ˉ = 1 2 E max ∑ i , j ≠ i h i j \bar{h}=\frac{1}{2 E_{\max }} \sum_{i, j \neq i} h_{i j} hˉ=2Emax1i,j=i∑hij
h i j h_{ij} hij: the distance from node i i i to node j j j; E m a x E_{max} Emax: the max number of edges n ( n − 1 ) / 2 n(n-1)/2 n(n−1)/2
All these models have prior assumption of the graph generation processes (以下传统图生成模型都有先验的假设)
Erdös-Renyi Random Graphs
Erdös-Renyi Random Graphs
-
Two variants
-
G n p G_{np} Gnp: undirected graph on n n n nodes where each edge ( u , v ) (u,v) (u,v) appears i.i.d. with probability p p p
-
G n m G_{nm} Gnm: undirected graph with n n n nodes, and m m m edges picked uniformly at random
-
-
Degree Distribution of G n p G_{np} Gnp
- Degree distribution of G n p G_{np} Gnp is binomial
P ( k ) = ( n − 1 k ) p k ( 1 − p ) n − 1 − k P(k)=\left(\begin{array}{c} n-1 \\ k \end{array}\right) p^{k}(1-p)^{n-1-k} P(k)=(n−1k)pk(1−p)n−1−k
-
Clustering Coefficient of G n p G_{np} Gnp
-
Expected E [ e i ] E[e_i] E[ei]
p k i ( k i − 1 ) 2 p \frac{k_{i}\left(k_{i}-1\right)}{2} p2ki(ki−1)
-
E [ C i ] E[C_i] E[Ci]
p ⋅ k i ( k i − 1 ) k i ( k i − 1 ) = p = k ˉ n − 1 ≈ k ˉ n \frac{p \cdot k_{i}\left(k_{i}-1\right)}{k_{i}\left(k_{i}-1\right)}=p=\frac{\bar{k}}{n-1} \approx \frac{\bar{k}}{n} ki(ki−1)p⋅ki(ki−1)=p=n−1kˉ≈nkˉ
-
-
Connected Components of G n p G_{np} Gnp
-
Def: Expansion
The Small-World Model
Paper : Collective dynamics of ‘small-world’ networks
The Small-World Model
-
Key Idea: Interpolate between regular lattice graphs and G n p G_{np} Gnp random graph (同时满足高聚类系数以及低图直径)
-
Small-World Model
-
Start with a low-dimensional regular lattice
-
In our case we are using a ring as a lattice
-
Has high clustering coefficient
-
-
Rewire: Introduce randomness (shortcuts)
-
Add/remove edges to create shortcuts to join remote parts of the lattice (随机创建shortcuts)
-
For each edge, with prob. p p p, move the other endpoint to a random node (对于每条边,以概率 p p p随机移除其终点到一个随机点)
-
-
Kronecker Graph Model
Paper : Kronecker Graphs: An Approach to Modeling Networks
Kronecker Graph Model
-
Key Idea: A recursive model of network structure
-
Kronecker product
Define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices (邻接矩阵不断kronecker product)
- Kronecker graph is obtrained by growing sequence of graphs by iterating the Kronecker product over the initiator matrix K 1 K_1 K1
K 1 [ m ] = K m = K 1 ⊗ K 1 ⊗ … K 1 ⏟ m times = K m − 1 ⊗ K 1 K_{1}^{[\mathrm{m}]}=K_{\mathrm{m}}=\underbrace{K_{1} \otimes K_{1} \otimes \ldots K_{1}}_{\mathrm{m} \text { times }}=K_{\mathrm{m}-1} \otimes K_{1} K1[m]=Km=m times K1⊗K1⊗…K1=Km−1⊗K1
Stochastic Kronecker Graphs
Stochastic Kronecker Graphs
-
Algorithm
-
Create N 1 × N 1 N_1 × N_1 N1×N1 probability matrix Θ 1 \Theta_{1} Θ1 (为了引入随机性,将邻接矩阵改为概率矩阵)
-
Compute the k t h k^{th} kth Kronecker power Θ k \Theta_{k} Θk
-
For each entry p u v p_{uv} puv of Θ k \Theta_{k} Θk include an edge ( u , v ) (u,v) (u,v) in K k K_k Kk with probability p u v p_{uv} puv
-
-
Generation of Kronecker Graphs
-
由于概率矩阵有 n 2 n^2 n2个元素,共需要翻转硬币 n 2 n^2 n2次,速度太慢
-
根据Kronecker Product的性质,采取如下方式确定一条连边
-
每一步都从4个大格子中根据概率选择一个进行深入,不断迭代选择直到最后一个特征元素,连边
-
若两次连边冲突,忽略即可
-
-