Prob 0015 ‐ Lattice Paths - maccergit/Euler GitHub Wiki

The first thing is to determine an approach to counting the paths that can be automated. A lattice with a single square shows a few important features :

From any vertex, there are only 2 choices - right or down.
Vertices on the right or bottom edges of the lattice have only a single choice.
Moving right or down from a vertex means you now have the same problem, but with a smaller lattice, as you cannot move up or left to expand the new lattice.
Because there are two moves from a vertex, the total paths from that vertex are the total paths from moving down and the paths from moving right. This is the sum of the paths from each possible next vertex. This is also true for vertices on the edges - although the possible vertices here are reduced to a single possible vertex, since the edge cuts off the other path.

From this, we come up with a recursive function :

\displaystyle paths(\text{vertex}) = \begin{dcases}
                                       1 & \text{vertex is lower, right corner} \\
                                       paths(\text{vertex\downarrow}) & \text{vertex is right edge} \\
                                       paths(\text{vertex}\rightarrow) & \text{vertex is bottom edge} \\
                                       paths(\text{vertex\downarrow}) + paths(\text{vertex}\rightarrow) &
                                     \end{dcases}

01

A direct implementation of the recursive function works, but is very slow - so slow we can't realistically get the answer to the problem :

A conservative estimate is the execution time triples for each additional cell in the lattice, which yields a run time of over 2.5 hours for the answer to the problem.

02

Some observation on the execution of the recursive algorithm reveals that it recomputes the path for the same vertices over and over again, which makes this a strong candidate for memoization. Fortunately, the previous problem shows that we don't need to roll own own memoization approach in Python - we can simply add the "@functools.cache" decorator to the recursive function. Not only is this fast enough to get our answer, but it's fast enough to go far beyond the limit specified in the problem :

The biggest issue with this approach is that we can only go a bit beyond the original limit - even though it appears we have the CPU to explore further, Python runs out of stack space. While we could alter the stack size allocation, let's try other approaches.

03

We can take what we learned from the recursive approach to develop an iterative approach. Since each cell depends on the values below and to the right, if we start at the bottom right corner of the grid, and then move upwards and leftwards, then the cells we visit will use values we have already calculated - and there is no need to recurse. This results in iterating through the grid once to get the final result, and if we store the results in a grid as we calculate them, we don't need to keep intermediate results on the stack. Since Python does not have a native 2-D array type, we will start by using a NumPy array.

Unfortunately, the obvious approach is not very satisfying - while this approach is fast enough to solve the original problem, we discover that using datatype "int" overflows long before the stack size limit :

We can address the overflow by telling NumPy to use a Python integer object, instead of a NumPy integer - the Python version will automatically promote to a big integer as needed. This works - although the speed is slow compared to the memoized approach, it is fast enough to show that we no longer blow away the stack with large limits (in my environment, the recursive approach cannot get to a limit of 210) :

04

Another way to implement the iterative approach is to use a list of lists to hold the grid. It's almost 2x faster than using NumPy :

05

Yet another iterative way involves using a dictionary to hold the computed values in a "virtual grid". For a key, we use the column and row value into the grid. The idea here is that a hash lookup is very fast compared to walking a list, although there is overhead in computing the hash value. Unfortunately, the result is about the same as the NumPy result :

06

Before leaving the iterative approach, let's see if we can get vectorize what we can - NumPy includes ways to speed up vector operations. The tricky part is that, while we can add a row to a row, we need to repeatedly add a rolling sum of the already calculated values across the row. If we did this in a loop, we would not gain any benefit - but NumPy includes the "cumsum()" function that returns an array of cumulative sums from summing the items of an array. Since the result is computed "left to right", and we are using the array "right to left", we need to flip the cumulative sum to use it. The end result is the fastest iterative result so far (by a small amount) :

07

Debugging the previous implementations required examination of the grid, which revealed a familiar pattern : Pascal's triangle. Thinking further about how the grid is created confirmed this is the case - and Pascal's triangle is well known as being tied to binomial coefficients (this is also discussed in the problem discussion paper). Knowing this, we can use the binomial formula to get a closed form for the solution :

\displaystyle \binom{2\times{limit}}{limit} = \prod_{i = 1}^{limit}{\frac{limit + i}{i}}

This is much faster and scales to large limits :

08

Rather than looping and computing a running product, if we note the denominators are the product of 1..limit, then that is the factorial of $limit$. Since math.prod() and math.factorial() are both available, let's try using those on the formula to see if the libraries help - and they do :

09

Examination of the formula reveals that if you have a fast factorial function, it can be simplified even further to :

\displaystyle \binom{2\times{limit}}{limit} = \frac{(2 \times limit)!}{(limit!)^2}

and this reduces the execution time even more :

10

It turns out that the SciPy library contains a full implementation of the binomial formula. However, it's important to specify the "exact = True" option, otherwise it will use the Gamma function to estimate factorials - and we are looking for exact answers. It turns out to not be as fast as the previous solution - I suspect it ends up computing extra factorials and does some other conversions or checks to make it more robust :

11

There is even an implementation of the binomial formula in the standard "math" library. For some reason, these library solutions match the slowest of the closed form solutions - but even then, they are fast enough for most needs - and the resulting code is very easy to read. However, it shows that just because a library is out there does not mean it will be the fastest implementation. Unless you need the absolute best performance, I would still recommend using this form - it's plenty fast, and is the easiest to code and understand. As the library implementation improves, code using the library will automatically get those improvements with updates to the library :