Project Progress - xzhangpeijin/MovieChainRunner GitHub Wiki

##Progress log for the project

####Initial runs

Component generation was the obvious first step. We examined the connected components of the graph and found that most components were < 10 vertices in size. The remaining largest component was a graph of size 4266.

Initial naive random walks on the 4266 size graph produced longest paths of around length 20 on average.

Further graph reduction was performed by looking at the reachable nodes from every vertex. If a node was able to reach less vertices than our desired path length, we removed the node from the graph. This reduced our size to 3240.

Naive random walks on the reduced graph produced longest paths of around length 50 on average.

####Intelligent walking

First phase intelligent walking was to pass the walker a map of the reachable nodes from each node in the graph. We only select nodes whose reachable nodes are large enough such that if we were able to hit all those nodes, our final path would be more than 315. This improvement brought our path lengths up to around 175.

Second pass intelligent walking was to generate reachability maps in real time. This means that we account for the fact that we cannot revisit any nodes already in our path. However this also means that we have to calculate reachablility at every step, which is computationally intensive. This slows down our walk speed by a lot. These random walks produced paths of around 220 on average.

Deterministic walking was done by starting at every vertex and walking in the same manner as the second phase intelligent walking. However, instead of picking a random valid node, we pick the node with largest reachability set at each point. This produces a longest path of length 277. However note that the random walk is able to hit this 277, and also has the probability of hitting a longer path given enough runtime. In practice we found that our random walk was able to hit 286 after around an hour of runtime and could likely hit a higher number given more time.

####Further improvements

We explored the reachability set of every node in our reduced graph and found that a large majority of them we either around 1650 or 3240. We found that the majority of these reachability sets have a large overlap and a small number of nodes which differ between the two. This led us to the hypothesis that the graph structure is a single large strongly connected component with several smaller graph structures branching off from this main component. Thus we expect that longest paths will have the majority of their vertices within this connected component structure, and additional lengths can be found by searching outwards from this component into the remainder of the reduced graph.

Thus, our strategy was to take the intersection of all of these reachability sets and search on those. Then, take the paths found by our searches and expand outwards into the reduced graph to complete our path finding. Our longest paths found in the strongly connected component were generally around 280 in length and could be extended into the reduced graphs for paths of around length 300.