Design pattern for graph processing.
Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation. To do so, we develop, for each given task, a task-specific class so that clients can create objects to perform the task. Generally, the constructor does some preprocessing to build data structures so as to efficiently respond to client queries. A typical client program builds a graph, passes that graph to an algorithm implementation class (as argument to a constructor), and then calls client query methods to learn various properties of the graph. We use the term source to distinguish the vertex provided as argument to the constructor from the other vertices in the graph.
Depth-first search
To search a graph, invoke a recursive method that visits vertices. To visit a vertex:
■ Mark it as having been visited.
■ Visit (recursively) all the vertices that are adjacent to it and that have not yet been marked.
This method is called depth-first search (DFS). DFS marks all the vertices connected to a given source in time proportional to the sum of their degrees.
public class DepthFirstSearch {
private boolean[] marked; // marked[v] = is there an s-v path?
private int count; // number of vertices connected to s
// single source
public DepthFirstSearch(Graph G, int s) {
marked = new boolean[G.V()];
dfs(G, s);
}
// depth first search from v
private void dfs(Graph G, int v) {
count++;
marked[v] = true;
for (int w : G.adj(v)) {
if (!marked[w]) {
dfs(G, w);
}
}
}
// is there an s-v path?
public boolean marked(int v) {
return marked[v];
}
// number of vertices connected to s
public int count() {
return count;
}
}
Finding paths
The single-source paths problem is fundamental to graph processing, below is a DFS-based implementation of Paths that extends the DepthFirstSearch.
public class DepthFirstPaths {
private boolean[] marked; // marked[v] = is there an s-v path?
private int[] edgeTo; // edgeTo[v] = last edge on s-v path
private final int s; // source vertex
public DepthFirstPaths(Graph G, int s) {
this.s = s;
edgeTo = new int[G.V()];
marked = new boolean[G.V()];
dfs(G, s);
}
// depth first search from v
private void dfs(Graph G, int v) {
marked[v] = true;
for (int w : G.adj(v)) {
if (!marked[w]) {
edgeTo[w] = v;
dfs(G, w);
}
}
}
// is there a path between s and v?
public boolean hasPathTo(int v) {
return marked[v];
}
// return a path between s to v; null if no such path
public Iterable<Integer> pathTo(int v) {
if (!hasPathTo(v)) return null;
Stack<Integer> path = new Stack<Integer>();
for (int x = v; x != s; x = edgeTo[x])
path.push(x);
path.push(s);
return path;
}
}
Breadth-first search
The paths discovered by depth-first search depend not just on the graph, but also on the representation and the nature of the recursion. Naturally, we are often interested in solving the following problem:
Single-source shortest paths. Given a graph and a source vertex s, support queries of the form Is there a path from s to a given target vertex v? If so, find a shortest such path (one with a minimal number of edges).
The classical method for accomplishing this task, called breadth-first search (BFS ), is also the basis of numerous algorithms for processing graphs.
public class BreadthFirstPaths {
private static final int INFINITY = Integer.MAX_VALUE;
private boolean[] marked; // marked[v] = is there an s-v path
private int[] edgeTo; // edgeTo[v] = previous edge on shortest s-v path
private int[] distTo; // distTo[v] = number of edges shortest s-v path
// single source
public BreadthFirstPaths(Graph G, int s) {
marked = new boolean[G.V()];
distTo = new int[G.V()];
edgeTo = new int[G.V()];
bfs(G, s);
assert check(G, s);
}
// multiple sources
public BreadthFirstPaths(Graph G, Iterable<Integer> sources) {
marked = new boolean[G.V()];
distTo = new int[G.V()];
edgeTo = new int[G.V()];
for (int v = 0; v < G.V(); v++) distTo[v] = INFINITY;
bfs(G, sources);
}
// BFS from single source
private void bfs(Graph G, int s) {
Queue<Integer> q = new Queue<Integer>();
for (int v = 0; v < G.V(); v++) distTo[v] = INFINITY;
distTo[s] = 0;
marked[s] = true;
q.enqueue(s);
while (!q.isEmpty()) {
int v = q.dequeue();
for (int w : G.adj(v)) {
if (!marked[w]) {
edgeTo[w] = v;
distTo[w] = distTo[v] + 1;
marked[w] = true;
q.enqueue(w);
}
}
}
}
// BFS from multiple sources
private void bfs(Graph G, Iterable<Integer> sources) {
Queue<Integer> q = new Queue<Integer>();
for (int s : sources) {
marked[s] = true;
distTo[s] = 0;
q.enqueue(s);
}
while (!q.isEmpty()) {
int v = q.dequeue();
for (int w : G.adj(v)) {
if (!marked[w]) {
edgeTo[w] = v;
distTo[w] = distTo[v] + 1;
marked[w] = true;
q.enqueue(w);
}
}
}
}
// is there a path between s (or sources) and v?
public boolean hasPathTo(int v) {
return marked[v];
}
// length of shortest path between s (or sources) and v
public int distTo(int v) {
return distTo[v];
}
// shortest path between s (or sources) and v; null if no such path
public Iterable<Integer> pathTo(int v) {
if (!hasPathTo(v)) return null;
Stack<Integer> path = new Stack<Integer>();
int x;
for (x = v; distTo[x] != 0; x = edgeTo[x])
path.push(x);
path.push(x);
return path;
}
// check optimality conditions for single source
private boolean check(Graph G, int s) {
// check that the distance of s = 0
if (distTo[s] != 0) {
StdOut.println("distance of source " + s + " to itself = " + distTo[s]);
return false;
}
// check that for each edge v-w dist[w] <= dist[v] + 1
// provided v is reachable from s
for (int v = 0; v < G.V(); v++) {
for (int w : G.adj(v)) {
if (hasPathTo(v) != hasPathTo(w)) {
StdOut.println("edge " + v + "-" + w);
StdOut.println("hasPathTo(" + v + ") = " + hasPathTo(v));
StdOut.println("hasPathTo(" + w + ") = " + hasPathTo(w));
return false;
}
if (hasPathTo(v) && (distTo[w] > distTo[v] + 1)) {
StdOut.println("edge " + v + "-" + w);
StdOut.println("distTo[" + v + "] = " + distTo[v]);
StdOut.println("distTo[" + w + "] = " + distTo[w]);
return false;
}
}
}
// check that v = edgeTo[w] satisfies distTo[w] + distTo[v] + 1
// provided v is reachable from s
for (int w = 0; w < G.V(); w++) {
if (!hasPathTo(w) || w == s) continue;
int v = edgeTo[w];
if (distTo[w] != distTo[v] + 1) {
StdOut.println("shortest path edge " + v + "-" + w);
StdOut.println("distTo[" + v + "] = " + distTo[v]);
StdOut.println("distTo[" + w + "] = " + distTo[w]);
return false;
}
}
return true;
}
}
For any vertex v reachable from s, BFS computes a shortest path from s to v (no path from s to v has fewer edges), also BFS takes time proportional to V-E in the worst case. Note that we can also use BFS to implement the Search API that we implemented with DFS, since the solution depends on only the ability of the search to examine every vertex and edge connected to the source.
As implied at the outset, DFS and BFS are the first of several instances that we will examine of a general approach to searching graphs. We put the source vertex on the data structure, then perform the following steps until the data structure is empty:
■ Take the next vertex v from the data structure and mark it.
■ Put onto the data structure all unmarked vertices that are adjacent to v. The algorithms differ only in the rule used to take the next vertex from the data structure (least recently added for BFS, most recently added for DFS). This difference leads to completely different views of the graph, even though all the vertices and edges connected to the source are examined no matter what rule is used.
Connected components
Our next direct application of depth-first search is to find the connected components of a graph.
public class CC {
private boolean[] marked; // marked[v] = has vertex v been marked?
private int[] id; // id[v] = id of connected component containing v
private int[] size; // size[id] = number of vertices in given component
private int count; // number of connected components
public CC(Graph G) {
marked = new boolean[G.V()];
id = new int[G.V()];
size = new int[G.V()];
for (int v = 0; v < G.V(); v++) {
if (!marked[v]) {
dfs(G, v);
count++;
}
}
}
// depth first search
private void dfs(Graph G, int v) {
marked[v] = true;
id[v] = count;
size[count]++;
for (int w : G.adj(v)) {
if (!marked[w]) {
dfs(G, w);
}
}
}
// id of connected component containing v
public int id(int v) {
return id[v];
}
// size of connected component containing v
public int size(int v) {
return size[id[v]];
}
// number of connected components
public int count() {
return count;
}
// are v and w in the same connected component?
public boolean areConnected(int v, int w) {
return id(v) == id(w);
}
}
How does the DFS-based solution for graph connectivity in CC compare with the union-find approach? In theory, DFS is faster than union-find because it provides a constant-time guarantee, which union-find does not; in practice, this difference is negligible, and union-find is faster because it does not have to build a full representation of the graph. More important, union-find is an online algorithm (we can check whether two vertices are connected in near-constant time at any point, even while adding edges), whereas the DFS solution must first preprocess the graph. Therefore, for example, we prefer union-find when determining connectivity is our only task or when we have a large number of queries intermixed with edge insertions but may find the DFS solution more appropriate for use in a graph ADT because it makes efficient use of existing infrastructure.
The problems that we have solved with DFS are fundamental. It is a simple approach, and recursion provides us a way to reason about the computation and develop compact solutions to graph-processing problems. Two additional examples, for solving the following problems, are given in the following.
Cycledetection. Support this query: Is a given graph acylic?
Two-colorability. Support this query:Can the vertices of a given graph be assigned one of two colors in such a way that no edge connects vertices of the same color? which is equivalent to this question: Is the graph bipartite?
public class Cycle {
private boolean[] marked;
private int[] edgeTo;
private Stack<Integer> cycle;
public Cycle(Graph G) {
if (hasSelfLoop(G)) return;
if (hasParallelEdges(G)) return;
marked = new boolean[G.V()];
edgeTo = new int[G.V()];
for (int v = 0; v < G.V(); v++)
if (!marked[v])
dfs(G, -1, v);
}
// does this graph have a self loop?
// side effect: initialize cycle to be self loop
private boolean hasSelfLoop(Graph G) {
for (int v = 0; v < G.V(); v++) {
for (int w : G.adj(v)) {
if (v == w) {
cycle = new Stack<Integer>();
cycle.push(v);
cycle.push(v);
return true;
}
}
}
return false;
}
// does this graph have two parallel edges?
// side effect: initialize cycle to be two parallel edges
private boolean hasParallelEdges(Graph G) {
marked = new boolean[G.V()];
for (int v = 0; v < G.V(); v++) {
// check for parallel edges incident to v
for (int w : G.adj(v)) {
if (marked[w]) {
cycle = new Stack<Integer>();
cycle.push(v);
cycle.push(w);
cycle.push(v);
return true;
}
marked[w] = true;
}
// reset so marked[v] = false for all v
for (int w : G.adj(v)) {
marked[w] = false;
}
}
return false;
}
public boolean hasCycle() {
return cycle != null;
}
public Iterable<Integer> cycle() {
return cycle;
}
private void dfs(Graph G, int u, int v) {
marked[v] = true;
for (int w : G.adj(v)) {
// short circuit if cycle already found
if (cycle != null) return;
if (!marked[w]) {
edgeTo[w] = v;
dfs(G, v, w);
}
// check for cycle (but disregard reverse of edge leading to v)
else if (w != u) {
cycle = new Stack<Integer>();
for (int x = v; x != w; x = edgeTo[x]) {
cycle.push(x);
}
cycle.push(w);
cycle.push(v);
}
}
}
}
public class TwoColor {
private boolean[] marked;
private boolean[] color;
private boolean isTwoColorable = true;
public TwoColor(Graph G) {
marked = new boolean[G.V()];
color = new boolean[G.V()];
for (int s = 0; s < G.V(); s++)
if (!marked[s])
dfs(G, s);
}
private void dfs(Graph G, int v) {
marked[v] = true;
for (int w : G.adj(v))
if (!marked[w]) {
color[w] = !color[v];
dfs(G, w);
} else if (color[w] == color[v]) isTwoColorable = false;
}
public boolean isBipartite() {
return isTwoColorable;
}
}
Symbol graphs
Typical applications involve processing graphs defined in files or on web pages, using strings, not integer indices, to define and refer to vertices. To accommodate such applications, we define an input format with the following properties:
■ Vertex names are strings.
■ A specified delimiter separates vertex names (to allow for the possibility of spaces in names).
■ Each line represents a set of edges, connecting the first vertex name on the line to each of the other vertices named on the line.
■ The number of vertices V and the number of edges E are both implicitly defined.
public class SymbolGraph {
private ST<String, Integer> st; // string -> index
private String[] keys; // index -> string
private Graph G;
public SymbolGraph(String filename, String delimiter) {
st = new ST<String, Integer>();
// First pass builds the index by reading strings to associate
// distinct strings with an index
In in = new In(filename);
while (in.hasNextLine()) {
String[] a = in.readLine().split(delimiter);
for (int i = 0; i < a.length; i++) {
if (!st.contains(a[i]))
st.put(a[i], st.size());
}
}
// inverted index to get string keys in an array
keys = new String[st.size()];
for (String name : st.keys()) {
keys[st.get(name)] = name;
}
// second pass builds the graph by connecting first vertex on each
// line to all others
G = new Graph(st.size());
in = new In(filename);
while (in.hasNextLine()) {
String[] a = in.readLine().split(delimiter);
int v = st.get(a[0]);
for (int i = 1; i < a.length; i++) {
int w = st.get(a[i]);
G.addEdge(v, w);
}
}
}
public boolean contains(String s) {
return st.contains(s);
}
public int index(String s) {
return st.get(s);
}
public String name(int v) {
return keys[v];
}
public Graph G() {
return G;
}
}
The above implementation builds three data structures:
■ A symbol table st with String keys (vertex names) and int values (indices)
■ An array keys[] that serves as an inverted index, giving the vertex name associated with each integer index
■ A Graph G built using the indices to refer to vertices
SymbolGraph uses two passes through the data to build these data structures, primarily because the number of vertices V is needed to build the Graph. In typical real world applications, keeping the value of V and E in the graph definition file (as in our Graph constructor at the beginning of this section) is somewhat inconvenient—with SymbolGraph, we can maintain files such as routes.txt or movies.txt by adding or deleting entries without regard to the number of different names involved.