Graph DFS - gpertea/javaTeach GitHub Wiki

Performing a "Depth First Search" (DFS) in a graph

The main goal in a graph traversal is to "visit" all the possible vertexes that can be accessed starting from a specific vertex ("source"). The Depth First Search strategy simply involves going down every edge as far as possible from the source vertex, and backtracking whenever there are no more unvisited vertexes down such a path.

A DFS traversal is very easy to understand when applied to a tree (i.e. a graph without back-edges).

Tree DFS

Because a generic graph can have back-edges which can generate cycles, we need to attach a "visited" property to a vertex in order to avoid going "in circles" when following the edges.

Graph DFS

The simplest and most intuitive implementation of a DFS traversal algorithm is by using recursion. Here it is the pseudocode (using some naming convention loosely based on our Graph java class implementation):

recursiveDFS(Vertex v) :
   v.visited = true
   for each edge e in src.adjList:
       if not e.to.visited
           recursiveDFS(e.to)

Notice on the animation above that there is a "stack" of vertexes (the yellow rectangles on the right) as they are "visited" in this traversal algorithm. As the recursion goes deeper, the current vertex v passed to the recursiveDFS(v) procedure is pushed onto the stack and the recursion moves on to the unvisited "neighbors" (linked vertexes) of v. When those neighbors are processed, the stack is "popped" as recursiveDFS() returns so the recursion continues with the next neighbor of the predecessor vertex and so on. When the "for each edge e" loop in the pseudocode above does not find find any unvisited neighbor for vertex v, then the recursion stops.

An interesting application of the DFS algorithm could be finding a path to the exit in a maze, where each "corner" or crossroad in a maze is a vertex (and so are the start and exit locations), and a corridor connecting them is an edge:

Graph DFS

In this application the DFS should actually stop the recursion in the pseudocode above when e.to is the same with the exit location (vertex) - that's how the maze was "solved", when an "exit path" was found.

Keeping track of the traversal "paths" during DFS should be easy as it mirrors the recursion stack -- the order in which the parameter Vertex v is assigned to an actual vertex in the graph in the call stack for recursiveDFS(v).

Keeping track of the actual path in Java, in this recursive implementation of DFS, can be implemented using a Stack< Vertex<T> > (using our Graph generic class implementation), and this stack should simply mirror the call stack with respect to the current Vertex v: we should push vertex v onto this stack when entering recursiveDFS(), and pop it off the stack just before exiting recursiveDFS(). So if we want to keep track of the path to the current vertex, we create this stack data structure as an external variable (or we can pass it as a second parameter to the recursiveDFS() method, but that would make the recursive call unnecessarily costly). The pseudocode looks like this:

path = new Stack<Vertex>() 
recursiveDFS(Vertex v) :
   path.push(v)
   v.visited = true
   for each edge e in src.adjList:
       if not e.to.visited
           recursiveDFS(e.to)
   path.pop(v)

Inspecting the value of the path variable at any point (e.g. printing it) after path.push(v) can show the full traversal path from the source vertex (the one passed for the very first call to recursiveDFS()) to the current vertex v being visited.

So if we want to show the path to the current vertex v being visited, we would simply show the content of this path Stack.

Practice: Implement and apply the above recursiveDFS() method in the Graph class and apply it to this graph, starting with vertex 0:

Simple Cyclic Graph

You should use a Graph<String> and use "V0", "V1" etc. as vertex data, instead of the simple numbers there. You may want to modify the toString() method of the Vertex class so it no longer prints the adjList content, but just the vertex data. Use addVertex() and addUndirectedEdge() methods to build the graph above and then call recursiveDFS("V0"). This requirement translates into implementing two recursiveDFS() methods, with these prototypes:

public void recursiveDFS(Vertex<T> v) //this is implemented following the pseudocode above
public void recursiveDFS(T srcData) //this should find the corresponding `Vertex<T> src` in the graph
                                     //and then simply call recursiveDFS(src)

The recursiveDFS(Vertex<T> v) method should also print the current vertex being visited (print something like "Visiting vertex "+v and the next line should show the path Stack (preferably with some indentation, like: "\tpath to it was: "+path ). Note that path should be a (protected or private) instance variable (field) of the Graph class, you can add it like this, after the vertices declaration:

  private Stack< Vertex<T> > path = new Stack<>();

Eliminating sub-paths (path redundancy)

As you implement and run the practice code above you will notice that some of the traversal paths printed there are quite redundant (and the one consisting of just V0 is just silly, right?). Printing such paths is not very useful to do for every vertex we visit, because many "paths" shown this way would be rather redundant as they are just parts of the "longer" paths covering them (they are "sub-paths"). So we'd rather show only the "longest" unique paths, those corresponding to each "complete" traversal, which is a path leading up to a vertex where we start backtracking, right? So we should only print the content of the path stack when we reach an end of a path, i.e. a vertex for which there are no adjacent vertexes that aren't visited yet. How do you do that?

Hint: use a boolean pathEnd variable in recursiveDFS() which is true only if no adjacent unvisited vertexes were found for Vertex v. That means that no further recursion will take place at that vertex v, which means there is no path "forward" from there into any "unexplored" vertexes. So if pathEnd is true after that for loop, only then you should print the path stack.

⚠️ **GitHub.com Fallback** ⚠️