【GNN】Graph embedding

最新推荐文章于 2023-06-13 23:00:00 发布

稷殿下

最新推荐文章于 2023-06-13 23:00:00 发布

阅读量514

点赞数 1

分类专栏：图论 GNN 文章标签：深度学习

本文链接：https://blog.csdn.net/qq_38904659/article/details/107677367

版权

GNN 同时被 2 个专栏收录

5 篇文章 1 订阅

订阅专栏

图论

2 篇文章 0 订阅

订阅专栏

Orignal in Wechat article.

Why do we need graph embedding

Machine learning algorithms are tuned for continuous data, hence why embedding is always to a continuous vector space.

What is graph embedding

Definition

Graph embedding is an approach that is used to transform nodes, edges, and their features into vector space (a lower dimension) whilst maximally preserving properties like graph structure and information. https://towardsdatascience.com/overview-of-deep-learning-on-graph-embeddings-4305c10ad4a4

Representation learning for networks

Graph representation:

Graph Representation (According to Jie Tang revised pic.)

Embedding methods

There is a variety of ways to go about embedding graphs, each with a different level of granularity. Embeddings can be performed on the node level, the sub-graph level, or through strategies like graph walks.

DeepWalk

Deepwalk belongs to the family of graph embedding techniques that uses walks, which are a concept in graph theory that enables the traversal of a graph by moving from one node to another, as long as they are connected to a common edge.

Step of DeepWalk (According to Jie Tang revised pic.)

Random walk

Generate $\gamma$ random walks for each vertex
Each random walk has length $l$
Every jump is uniform

Code of random walk on example graph, revised according to the repository on Github.

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import random

# Create Graph
G = nx.Graph()

# Add nodes
G.add_nodes_from(range(0, 9))

# Add edges
G.add_edge(0, 1)
G.add_edge(1, 2)
G.add_edge(1, 6)
G.add_edge(6, 4)
G.add_edge(4, 3)
G.add_edge(4, 5)
G.add_edge(6, 7)
G.add_edge(7, 8)
G.add_edge(7, 9)
G.add_edge(2, 3)
G.add_edge(2, 4)
G.add_edge(2, 5)

# Draw graph
#nx.draw(G)
#plt.show()

# Define red and green nodes
red_vertices = [0, 2, 3, 9]
green_vertices = [5, 8]

# Store number of successes
nsuccess = 0

# Execute 1million times this command sequence
for step in range(1, 2):
    # Choose a random start node
    vertexid = 1
    # Dictionary that associate nodes with the amount of times it was visited
    visited_vertices = {}
    # Store and print path
    path = [vertexid]
    
    print("Step: %d" % (step))
    # Restart the cycle
    counter = 0
    # Execute the random walk with size 100,000 (100,000 steps)
    for counter in range(1, 10): 
        # Extract vertex neighbours vertex neighborhood
        vertex_neighbors = [n for n in G.neighbors(vertexid)]
        # Set probability of going to a neighbour is uniform
        probability = []
        probability = probability + [1./len(vertex_neighbors)] * len(vertex_neighbors)
        # Choose a vertex from the vertex neighborhood to start the next random walk
        vertexid = np.random.choice(vertex_neighbors, p = probability)
        # Accumulate the amount of times each vertex is visited
        if vertexid in visited_vertices:
            visited_vertices[vertexid] += 1
        else:
            visited_vertices[vertexid] = 1

        # Append to path
        path.append(vertexid) 
        nsuccess = nsuccess + 1

    # Organize the vertex list in most visited decrescent order
    mostvisited = sorted(visited_vertices, key = visited_vertices.get, reverse = True)
    print("Path: ", path)
    # Separate the top 10 most visited vertex
    print("Most visited nodes: ", mostvisited[:10])

RW path to matrix ${\bf \Phi}$

Define a window.
The steps in a window of that traversal could be aggregated by arranging the node representation vectors $\pmb{v} \in \mathbb{R}^d$ next to each other in a matrix ${\bf \Phi}$ .

The approach taken by DeepWalk is to complete a series of random walks using the equation:
$KaTeX parse error: No such environment: equation* at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲*̲}̲P(v_i| {\bf \Ph…$
The goal is to estimate the likelihood of observing node $v_i$ given all the previous nodes visited so far in the random walk.

And next, feed that matrix representing the graph to a neural networks make a prediction about a node feature or classification, e.g. The method used to make predictions is skip-gram.

Node2vec

Idea: use flexible, biased random walks that can trade off between local and global views of the network.

The difference between Node2vec and DeepWalk is subtle but significant. Node2vec features a walk bias variable $\alpha$ , which is parameterized by $p$ and $q$ . The parameter $p$ prioritizes a breadth-first-search (BFS) procedure, while the parameter $q$ prioritizes a depth-first-search (DFS) procedure. The decision of where to walk next is therefore influenced by probabilities $1 / p$ or $1 / q$ .
$KaTeX parse error: No such environment: equation* at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲*̲}̲\alpha_{pg}(t,x…$
where $t$ is the last node, $x$ is the next node and now resides at node $v$ .

BFS & DFS

As the visualization implies, BFS is ideal for learning local neighbors, while DFS is better for learning global variables

Graph2vec

A modification to the node2vec variant, graph2vec essentially learns to embed a graph’s sub-graphs.

Using an analogy with word2vec, if a document is made of sentences (which is then made of words), then a graph is made of sub-graphs (which is then made of nodes).

Everything is made of smaller things

Three steps:

Sample and re-label all sub-graphs in the graph;
Training the skip-gram;
The embed is calculated by providing the id index vector of the sub-graph at the input.