Graph
a deep domain.
Graph is a way to visualize the relationship
G = (V, E), E is any relationship on V
the relationship E is intuitively translated as:
if (a,b) in E, then a has an edge to b in E
If E is symmetric, then G is called undirected graph
If E is asymmetric, the nG is called a directed graph
Tree is a special kind of graph without cycle.
a path with starting vertex = ending vertex is a cycle
Weighted graph
G = (V, E, W)
m wgere W are weights on edges
e.g. modelling distance / modelling for the strength of the pair(molecules, atoms)
Concepts
- Degree of a node
- D(x) the number of edges attaching to node x
- Out-degree
- number of edges leaving x
- In-degree
- number of edges
- Regular graph
- graph that all nodes are of the same degree
Subgraph and induced subgraph
Connected Graph, disconnected
connected if for any pair of node (x,y) there is a path between x and y in graph
connected components of an undirected graph: we can partition the set of vertices V into disjoint subsets such that each subset, vectices are connected, between them vertices are not connected
Representation
Adjacent matrix
A is N by N matrix, A[j, k] = 1 if and only if there is an edge between j and k. A is a symmetric matrix iff graph is undirected
(caveat: when writing code, starting with 0. When talking about graph, starting with 1 … ^yes, it’s confusing)
Array of linked lists
O(D(x)) for checking edge and for enumerating all edges from node x
(Consider 1M social network (sparse), it seems array of linked lists way is a better way… )
Graph related questions
- shortest paths
- independent sets
- cover sets
- minimum cut
- (minimum) spanning trees
- node enumerations
- Travelling salesman problem
- Euler tour
Graph exploration
BFS breadth first search
DFS depth first search (kinda more important)
BFS(X)
start with vertex X = 1, explore all its neighbors first, along edges attaching to X, mark them as available
mark vertex X unavailable
then start with the next available vertex, repeat
edges that are travelled by BFS is put into a set T
“Queue”
BFS with distance calculated
Source is d[source] = 0
Each time when exploring u, we add vertex v into available list:
d[v] = d[u] + 1
d eventually is the distance (number of edges) from source to each vertex
all edges in BFS forms a tree
DFS(X)
a recursive exploration algorithm that
- each function call is an exploration of vertex v
- it will explore the next unvisited vertex u accessible from current vertex v, do recursive exploration on u
- backtrack (go back to the parent in the recursion tree) when every vertex around the current vertex is explored, find new edges from parent to unexplored vertices. Backtrack is done by the natrual of recursive function call
- If no unexplored vertex is available to explore, restart the DFS on a new vertex (if the undirected fraph is not connected)
Check if explored: use an array
void DFS(int c){
mark_visited(c) // visited[c] = 1
do_some_work(c) // such as printing
for each edge(c, v) from c {
// this can be any ordering of edges
if (v is unvisited) then DFS(v)
}
// we finished exploring c after exiting this line
}
Code Here
It’s also a tree. Because you won’t visit a node twice
Running time:
If graph is implemented in linked list: O(|E|)
If graph is implemented in adgacency matrix: O(|V|^2)
Property:
If graph is connected, then no matter where we start for the DFS/BFS, all the vertices will be visited.
Do the practice
If undirected graph has C many connected components, then DFS needs C many restarts to explore all vertices
If graph is directed, then DFS might need a few rounds to start with new vertices in order to explore all the vertices
Edges in DFS of undirected graph, DFS trees
For the DFS tree:
the first vertex is treated as root
Tree edges: used by the DFS, pointing from v to u when we explore u from v, tree edges form a rooted DFS explore tree
**backward edges: ** marked when DFS stands at v and try to explore u but found u is explored as an ancestor in the tree
Cross edges: were in the graph, other case than the above
Property:
No cross edges for DFS tree in undirected graph
No two tree edges can pointing to the the same vertex (as you won’t visit a node twice)
Note: the DFS trees is directed, but if we remove all directions, it’s just original graph that DFS visited
Cycle detection for undirected graphs
Problem: given an undirected graph G = (V, E) detect whether the graph has a cycle or not
DFS can solve it
Claim: start from any vertex, DFS reports no backward edge iff no cycle
pf:
if report a backward edge then cycle: simple. easy, intuitive.
if cycle then report a backward edge: DFS will hit a vertex v in the cycle first, then on of the edges in the cycle has to be a backward edge (since all edges need to be classfied into either backward edge or a tree edge, and they can’t all be tree edges since no two tree edges can point to the same vertex) [run the algorithm and discuss case by case. Contradiction works quite well in graph theory]
Tree as a special graph
Def 1: undirected graphs that are connected and any two vertices are connected by exactly one path is called tree
Def 2: Undirected graphs that are connected and have no cycle is called tree
Property 1: For any tree, you can always find a vertex of degeree 1 ( otherwise there is a cycle ) [This property is very important for induction]
Property 2: Tree has |V| - 1 edges (proof by induction, using property 1)
Abstract Tree Algorithm: longest path
here tree is unrooted and edges are undirected
Find the longest path on tree, i.e. find a pair of node x and y such that the distance between x and y is maximized
naïve algorithm: BFS for each tree. O(|V| * |E|)
a more sophisticated one:
First you can assume your algorithm starting with some vertex v, and we sometimes define v as root and treat the tree as a rooted tree, so the concept of ancesor and children applies.
Assume we fix the root and there is a longest path P from x to y to T, what’s its properties?
P0: x and y are leaves
P1: this path has to go through LCA(x,y) [the least common ancestor of 2 leaves]
P2: root to x or root to y is the longest path from the root to leaves (i.e. the depth of the tree)
proof by contradiction: suppose another leaf z is not x or y, root to z is longest from the root
- root to z is not going through LCA(x ,y), then clearly there are longer path x to z. The path is x -> LCA(z, LCA(x,y)) -> z
- x y z are sharing same LCA, then path x to z or y to z should be longer.
[To do graph proof, you have to draw figures… ]
Find the longest path from root a leaf, by a traversal, in O(E)
from the leaf, use it as a root, find the longest path from this root, O(E)