Prob 0003 ‐ Largest Prime Factor - maccergit/Euler GitHub Wiki

A classic "hard" CS problem - the difficulty of factoring large numbers is the root of modern cryptographic methods (although once quantum computing becomes common, new cryptographic algorithms will be needed, as factoring is much easier with quantum algorithms). Since factoring (and prime generation) is so well studied, this is a rich area to explore with several alternative solutions.

01

The naive approach is to find a bunch of primes, and then use trial division to determine which primes are factors. Some immediate optimizations are :

  • Use a Sieve of Eratosthenes approach to generate primes. We can quickly find the next element in the sequence by returning entries as we build the sieve, rather than building the entire sieve at once.
  • Note that no factor will be larger than $\sqrt{limit}$, so use that as an internal limit on searches, building the sieve, etc... Even though the resulting implementation is fast enough to get the example result in 50 µsec, it's not really fast enough for the actual problem. The timing graph shows 1 billion factors in 400 msec, but the problem is 2 orders of magnitude larger. Since the graph appears to have a linear trend, extrapolating by 2 orders of magnitude give about 400 sec for the result - which is almost 7 minutes. While there is no time limit on problems in Project Euler (this is different from the Rosalind problems), we are looking for implementations that run under 1 min (preferably, in a few seconds or less).

02

This version does not worry about computing primes, but simply finds factors from smallest to largest using pure trial division. This means it won't skip over composite numbers - but it also does not take any time trying to compute primes. We know the results are primes, as it returns the smallest factor available and then divides by the found number and factors the result. Without the overhead of generating primes (so we can skip composites), the end result is much faster. In fact, it's so much faster that I put it in my "utils" module to be used by later problems.

NOTE : The values used for the timing plot may seem unusual. The algorithm is sensitive to the value being factored, and it takes much longer to factor a large prime (which is just itself as the factor) than composite value of the same magnitude. I thus provided values that I knew would be composed of 2's and 5's (aka "10's") by ending the values with 0's. To add variety, once the 0's are removed, the remaining digits are a multiple of 5 (because they end with 5) - but once that 5 is factored out, then they end with 1. At this point, all bets are off - they may be a multiple of 3, or they may even be prime. I note the value provided for the problem was also carefully chosen to be a composite of much smaller factors.

03

SymPy includes its own sieve. Using a standard library instead of rolling your own has several benefits :

  • Code can be cleaner. A well written library will abstract out much of the problem space, hiding the details of the implementation. However, a poorly written library can leak details - but this is also likely to be an issue with "roll your own", leading to undesired code coupling.
  • Libraries like SymPy can include implementations that use techniques beyond what I know. These implementations may be faster, or more robust, or better tested - and I get the benefit of this by simply replacing my code with the library call. Ideally, one is a drop in replacement for the other, and avoids the need to write code to address any "impedance mismatch".
  • As libraries are improved, I may gain the benefits of the updates without writing any new code by simply updating the library. Thus, we return to the original algorithm, but use the sieve from SymPy - and it turns out SymPy's sieve is about 1000x faster (note the scale is µsec, not msec) :

04

Of course, why use the library to implement a factoring algorithm when the library provides one? SymPy also includes its own factoring algorithm, which I expect takes advantage of advanced factoring techniques :

05

The pyprimesieve library is a wrapper around primesieve, which is a very fast sieve implementation. Unfortunately, pyprimesieve does not appear to have any recent updates, so we may not be getting the benefits of the latest primesieve. However, the suggested primesieve-python package did not want to install, while pyprimesieve installed fine. pyprimesieve gives the best results so far, and will be my preferred prime number library.