Project Strategy - xzhangpeijin/MovieChainRunner GitHub Wiki
##Strategies for finding the longest chain
###Preprocessing
####Graph generation:
Graph is generated by looping through all pairs of vertices and checking if they overlap
If there are n vertices and the longest vertex is k words, the generation is O(kn^2)
####Graph format:
Graphs are stored using in adjacency list and out adjacency lists. This allows for O(n) lookup for an edge existing in the worst case but O(1) random access of edges.
####Component generation:
Connected components are generated by performing BFS while assuming that all edges are undirected. This is only O(E) where E is the number of edges. This allowed us to reduce the largest graph from 6561 to 4266
####Graph reduction
We perform forward bfs and backwards bfs on every node in the reduced graph, and combine all the reachable vertices from the two searches. If the size of the combined vertices is less than 315 (our path cutoff), then we removed the vertex from our graph. This allowed us to reduce our graph from 4266 to 3240.
###Path finding strategies
####Naive Random Walks
With naive random walks, we simply start at a random node, and at each step pick a random valid node to move to. Although we may eventually hit a long path, there are way too many short paths in the graph which makes the chances of hitting a long path very slim. Average path lengths with naive walks tend to be ~40.
####Intelligent Walking
Note that when walking, we can see what our in and out vertices are. Consider the nodes reachable from each in and out vertex. If the size of that set plus our currently visited path is not larger than our target path length, then we can see that visiting that vertex would make it impossible to get a path that long. Since the reachability set can change depending on which vertices we've already visited, accurately doing this walking requires recalculating the reachability sets after every step. Intelligent walking heuristics found longest paths of up to around 280.
####Component Extension
We reduce the reduced graph to its center strongly connected component of 642 and performed new "extension" walks by finding long paths within the connected component and extending this path into the reduced graph. We use this method for path finding since we predict that the size 3240 graph is not fully connected, some nodes are unable to reach other nodes. However all nodes are able to reach within this 642 size component so if we find a "center" path and search outwards we should get better results. Performing these walks gives us longest paths of around 310.