【GNN】Graph embedding

Orignal in Wechat article.

Why do we need graph embedding

Machine learning algorithms are tuned for continuous data, hence why embedding is always to a continuous vector space.

What is graph embedding

Definition

Graph embedding is an approach that is used to transform nodes, edges, and their features into vector space (a lower dimension) whilst maximally preserving properties like graph structure and information. https://towardsdatascience.com/overview-of-deep-learning-on-graph-embeddings-4305c10ad4a4

Representation learning for networks

Graph representation:

Graph Representation (According to Jie Tang revised pic.)

Embedding methods

There is a variety of ways to go about embedding graphs, each with a different level of granularity. Embeddings can be performed on the node level, the sub-graph level, or through strategies like graph walks.

DeepWalk

Deepwalk belongs to the family of graph embedding techniques that uses walks, which are a concept in graph theory that enables the traversal of a graph by moving from one node to another, as long as they are connected to a common edge.

Step of DeepWalk (According to Jie Tang revised pic.)

Random walk

  • Generate γ \gamma γ random walks for each vertex
  • Each random walk has length l l l
  • Every jump is uniform

Code of random walk on example graph, revised according to the repository on Github.

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import random

# Create Graph
G = nx.Graph()

# Add nodes
G.add_nodes_from(range(0, 9))

# Add edges
G.add_edge(0, 1)
G.add_edge(1, 2)
G.add_edge(1, 6)
G.add_edge(6, 4)
G.add_edge(4, 3)
G.add_edge(4, 5)
G.add_edge(6, 7)
G.add_edge(7, 8)
G.add_edge(7, 9)
G.add_edge(2, 3)
G.add_edge(2, 4)
G.add_edge(2, 5)

# Draw graph
#nx.draw(G)
#plt.show()

# Define red and green nodes
red_vertices = [0, 2, 3, 9]
green_vertices = [5, 8]

# Store number of successes
nsuccess = 0

# Execute 1million times this command sequence
for step in range(1, 2):
    # Choose a random start node
    vertexid = 1
    # Dictionary that associate nodes with the amount of times it was visited
    visited_vertices = {}
    # Store and print path
    path = [vertexid]
    
    print("Step: %d" % (step))
    # Restart the cycle
    counter = 0
    # Execute the random walk with size 100,000 (100,000 steps)
    for counter in range(1, 10): 
        # Extract vertex neighbours vertex neighborhood
        vertex_neighbors = [n for n in G.neighbors(vertexid)]
        # Set probability of going to a neighbour is uniform
        probability = []
        probability = probability + [1./len(vertex_neighbors)] * len(vertex_neighbors)
        # Choose a vertex from the vertex neighborhood to start the next random walk
        vertexid = np.random.choice(vertex_neighbors, p = probability)
        # Accumulate the amount of times each vertex is visited
        if vertexid in visited_vertices:
            visited_vertices[vertexid] += 1
        else:
            visited_vertices[vertexid] = 1

        # Append to path
        path.append(vertexid) 
        nsuccess = nsuccess + 1

    # Organize the vertex list in most visited decrescent order
    mostvisited = sorted(visited_vertices, key = visited_vertices.get, reverse = True)
    print("Path: ", path)
    # Separate the top 10 most visited vertex
    print("Most visited nodes: ", mostvisited[:10])

RW path to matrix Φ {\bf \Phi} Φ

  • Define a window.
  • The steps in a window of that traversal could be aggregated by arranging the node representation vectors v ∈ R d \pmb{v} \in \mathbb{R}^d vvvRd next to each other in a matrix Φ {\bf \Phi} Φ.

The approach taken by DeepWalk is to complete a series of random walks using the equation:
KaTeX parse error: No such environment: equation* at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲*̲}̲P(v_i| {\bf \Ph…
The goal is to estimate the likelihood of observing node v i v_i vi given all the previous nodes visited so far in the random walk.

And next, feed that matrix representing the graph to a neural networks make a prediction about a node feature or classification, e.g. The method used to make predictions is skip-gram.

Node2vec

Idea: use flexible, biased random walks that can trade off between local and global views of the network.

The difference between Node2vec and DeepWalk is subtle but significant. Node2vec features a walk bias variable α \alpha α, which is parameterized by p p p and q q q. The parameter p p p prioritizes a breadth-first-search (BFS) procedure, while the parameter q q q prioritizes a depth-first-search (DFS) procedure. The decision of where to walk next is therefore influenced by probabilities 1 / p 1/p 1/p or 1 / q 1/q 1/q.
KaTeX parse error: No such environment: equation* at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲*̲}̲\alpha_{pg}(t,x…
where t t t is the last node, x x x is the next node and now resides at node v v v.

BFS & DFS

As the visualization implies, BFS is ideal for learning local neighbors, while DFS is better for learning global variables

Graph2vec

A modification to the node2vec variant, graph2vec essentially learns to embed a graph’s sub-graphs.

Using an analogy with word2vec, if a document is made of sentences (which is then made of words), then a graph is made of sub-graphs (which is then made of nodes).

Everything is made of smaller things

Three steps:

  • Sample and re-label all sub-graphs in the graph;
  • Training the skip-gram;
  • The embed is calculated by providing the id index vector of the sub-graph at the input.

A brief history of graph embedding

By Jie Tang

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值