Graph Search and Connectivity
Generic Graph Search
Goals 1. find everything findable
2. don't explore anything twice
Generic Algorithm (given graph G, vertex S)
--- initialize S explored (all others unexplored)
--- while possible:
--- choose an edge(u, v) with u explored and v unexplored
--- mark v explored
1. Breadth-First Search (BFS) O(m+n) time using a queue
--- explore nodes in 'layers'
--- can compute shortest paths
--- can compute connected components of an undirected graph
The basics:pseudocode
BFS(Graph G, start vertex s)
(all nodes initially unexplored)
mark s as explored
let Q = queue data structure(FIFO), initialized with s
while Q != 0:
remove the first node of Q, call it v
for each edge(v, w):
if w unexplored
mark w as explored
add into Q (at the end)
Shortest Paths:
Goal: compute dist(v), the fewest # of edges on a path from s to v
Extra code:
initialize dist(v) = 0 if v == s
when considering edge(v, w):
if w unexplored then set dist(w) = dist(v) + 1
claim: at termination, dist(v) = i <=> v in ith layer
Undirected Connectivity
let G = (V, E) be an undirected graph
Connected components == the 'pieces' of G
Goal: compute all connected components(why? check if network is disconnected, graph visualization, clustering, similarity)
all nodes unexplored
(assume labelled 1 to n)
for i = 1 to n
if i not yet explored
BFS(G, i) //discovers precisely i's connected components
2. Depth-First Search (DFS) O(m+n) time using a stack
--- explore aggressively like a maze, backtrack only when necessary
--- compute topological ordering of directed acycle graph(DAG)
--- compute connected components in directed graphs
pseudocode:
use a stack instead of a queue
recursive version:
DFS(Graph G, start vertex s)
mark s as explored
for every edge(s,v)
if v unexplored
DFS(G,v)
Application: Topological Sort (DAG)
Definition: A topological ordering of a directed graph G is a labelling f of G's node's such that:
1. the f(v)'s are the set{1,2,...,n}
2. (u,v) => f(u) < f(v)
note that if G has directed cycle => no topological ordering
Straightforward solution to Topological Sort
note: every directed acyclic graph has a sink vertex(入度为0的node,无前驱)
To compute topological ordering:
let v be a sink vertex of G
set f(v) = n
recurse on G - {v}
(1) 从有向图中选一个没有前驱的顶点
(2) 从图中删去该点,并删去从该点出发的所有边
(3) 重复上两步,直到图中再没有有前驱的点为止
Topological Sort via DFS
DFS(G, s)
mark s explored
for every edge(s, v)
if v not yet explored
DFS(G, v)
set f(s) = current_label
current_label --
DFS-loop(Graph G)
mark all node unexplored
current_label = n
for each vertex v:
if v unexplored
DFS(G, v)
3. Computing Strong Components: The Algorithm
Strongly connected Components
Formal Definition: the strongly connected Components(SCCs) of a directed graph G are the equivalance classes of the relation:
u~v <=> u ->v and v -> u in G
Kosaraju's Two-Pass Algorithm 2*DFS = O(m+n)
1. let Gr = G with all arcs reversed
2. run DFS-loop on Gr <---------- Goal: compute 'magical ordering' of nodes
let f(v) = 'finishing time' of each v
3. run DFS-loop on G <---------- Goal: discover the SCCs one-by-one
processing nodes in decreasing order of finishing times
SCCs = nodes with the same 'leader'
pseudocode:
DFS(G, i)
make i as explored
set leader(i) = node s
for each arc(i, j):
if j not yet explored:
DFS(G, j)
t++
set f(i) = t // i's finishing time
DFS-loop(Graph G)
global variable t = 0 // # of nodes pressed so far (for finishing times in 1st pass)
global variable s = Null // current source vertex (for leaders in 2nd pass)
Assume nodes labelled 1 to n
for i = n down to 1
if i not yet explored
s = i
DFS(G, i)
Python Code:
import sys
import threading
import copy
threading.stack_size(67108864)
sys.setrecursionlimit(300000)
def DFS(edges, i, index):
global t, vertices, new_vertices, s, compare
if index == 1: # 1st pass
vertices[i-1][1] = True # mark it explored
if index == 2: # 2nd pass
vertices[compare[i]-1][1] = True
vertices[compare[i]-1].append(s) # set leader(i) = node s
if i in edges:
for v in edges[i]:
if index == 1:
if vertices[v-1][1] == False:
DFS(edges, vertices[v-1][0], index)
if index == 2:
if vertices[compare[v]-1][1] == False:
DFS(edges, vertices[compare[v]-1][0], index)
if index == 1:
t = t + 1 # i's finishing time
vertices[i-1].append(t)
temp = vertices[i-1].copy()
temp[1] = False
new_vertices.append(temp)
compare[vertices[i-1][0]] = t
def DFS_loop(edges, index):
global t, vertices, new_vertices, s
t = 0 #for finishing times in 1st pass
n = len(vertices)
for i in range(1, n+1):
v = vertices[n-i]
if v[1] == False:
s = v[0]
DFS(edges, v[0], index)
def main():
global vertices, new_vertices, compare
f = open('SCC.txt')
_f = list(f)
vertices = list() #[number, False] false indicates unexplored
new_vertices = list() #[number, False, t, s]
edges = dict() # {1:[2,5,6...]...}
edges_rev = dict() # {2:[8,9,5...]...}
compare = dict()
for i in range(0, 875714): #875714 initialize V
vertices.append([i+1, False])
for edge in _f: # initialize E
temp = edge.split()
edge_temp = [int(temp[0]), int(temp[1])]
edge_rev_temp = [edge_temp[1], edge_temp[0]]
if edge_temp[0] not in edges:
edges[edge_temp[0]] = [edge_temp[1]]
else:
edges[edge_temp[0]].append(edge_temp[1])
if edge_rev_temp[0] not in edges_rev:
edges_rev[edge_rev_temp[0]] = [edge_rev_temp[1]]
else:
edges_rev[edge_rev_temp[0]].append(edge_rev_temp[1])
DFS_loop(edges_rev, 1)
vertices = copy.deepcopy(new_vertices)
DFS_loop(edges, 2)
result = dict()
for item in vertices: # nodes with the same 'leader'
if item[3] not in result:
result[item[3]] = 1
else:
result[item[3]] = result[item[3]] + 1
r = list() #output the sizes of the 10 largest SCCs
for key in result:
r.append(result[key])
r = sorted(r, reverse = True)
print(r[0:9])
if __name__ == '__main__':
thread = threading.Thread(target = main)
thread. start()