Prob 0014 ‐ Longest Collatz Sequence - maccergit/Euler GitHub Wiki

Some of the earlier sub-optimal solutions can take around one minute to generate the timing plot.

01

The direct approach is to generate the Collatz sequence for each number in the range, keeping track of which sequence is the longest so far. The scaling of the solution is not obvious from inspection - but appears to be linear :

02

The discussion page for the problem notes that we only need to look at the top half of the range, as all lengths in the bottom half will be shorter than the lengths in the top half. Note the use of Python's // operator to do integer division to avoid introducing floating point values. This does not cut the execution time completely in half, as we are dropping the shortest paths and keeping the longest ones - but it still is a good performance improvement :

03

The solution can also be reworked to a recursive approach by noting that the length if the sequence is the same as one more than the length of the sequence starting with the next Collatz number. However, the recursive approach appears to be quite a bit slower :

04

The discussion paper also mentions that when the current number is odd, we can skip a step : since $3n + 1$ will be even in cases where $n$ is odd, then we know the sequence must have another element (as $1$ is not even), which will be $\frac{3n + 1}{2}$, and we can start the sequence with that element and say the length is $2$ longer than the resulting sequence. While it helps the plain recursive case, it does not help on the direct approach, as once the direct approach starts generating the sequence it keeps going until it hits the end - intermediate odd values don't get to skip a step :

05

If we combine both optimizations to the recursive approach, the result is faster than the plain direct approach - but not quite as fast as the optimized direct approach :

06

Examination of how the Collatz lengths are produces reveals a lot of recalculation of values we already calculated, which then calls for memoization to avoid the expensive recalculation of lengths we already know. Note that it might be tempting to add memoization to the routine that calculates the next Collatz number, but we only need to memoization on the length routine - a number already seen will get handled there, and we will never get a cache hit on the second cache, only cache misses. Also note that this optimization does not lend itself to the direct approach, as that approach does not use sub-sequences. This is much faster :

07

This implementation uses the Python standard library to implement memoization by using a function decorator in stead of a home grown implementation. While it's much faster than without the memoization, it's also much slower than the "roll your own" memoization - probably because the standard implementation needs to do error checking, be more general, and so on :

08

If we combine all the optimizations (only look at top half, odd values skip a step, memoization), we get the fully optimized version - a bit better than 100x faster than the plain direct approach :

09

...or we can use the standard library provided approach to memoization. It's considerably slower than the fully optimized home grown approach, but it's still under a second of execution time - and it's much easier to use :

Final Notes

Even though the recursive approach was slower than then direct approach, the recursive approach allowed the use of the optimizations that did not lend themselves to the direct approach. Since the optimizations were all orthogonal to each other, we could use them together to get the biggest performance improvement.