Prob 0010 ‐ Summation of Primes - maccergit/Euler GitHub Wiki

While in theory, this can be done using brute force methods (such as trial division), even fast machines are too slow for this approach - they are fine to about 100K, but not 1M. Thus, we will concentrate on various Sieve of Eratosthenes approaches. From the Wikipedia article, there has been a lot of research into various sieve implementations.

01

A simple implementation of the sieve, optimized to use the last found prime as the next filter in the sieve - this gives us $\mathcal{O}(n)$ behavior :

02

The previous implementation does a lot of repeated filtering of entries in the sieve - "2" filtered out all even values, but then subsequent primes also filter out even multiples of those primes, so 6, 10, 14, 22, ... are filtered again. Likewise, "3" filtered out all multiples of 3, but subsequent primes also filter out multiples of those primes, including 15, 21, 33, ... Thus, composites that have multiple unique prime factors get filtered multiple times (once for each unique prime factor) - "8" is filtered only once (it is a power of 2, so 2 is it's only unique prime factor), while "66" is filtered 3 times (2, 3, and 11). While it appears we cannot completely avoid duplicates, we can considerably reduce them by starting with $current^2$, and then filtering by $2\times{current}$ - it can be shown that $\frac{n^x - n^2}{2n}$ always divides evenly, so using a stride of $2n$ starting with $n^2$ will filter out all of the powers of $n$ in the sieve when $n > 2$. The end result also has linear behavior, but is about 2x faster :

03

The previous methods accumulate a list of the primes found as the sieve is built, and then sums the items of the list together to produce the result. However, the sieve itself (as a dict) includes the prime values as the keys of the elements that have not been marked. A slightly cleaner version of the code avoids using the separate list, and then sums the values of the keys for unmarked elements. While a bit cleaner, the code appears to run at about the same speed (the slight difference may be within the margin of error - I would need to do many more runs and apply statistical analysis to know if this is the case) :

04

Yet another modification is to remove the composite elements from the dict, instead of simply marking them. By the time we are done, the dict contains just the primes (we use a dict for fast access to an element by value), and we don't need to filter over the marked items. However, we do need to be careful to use dict access methods that handle cases where the key is not present, and it appears the overhead of removing the item from the dict may also be larger than any gains we see - the end result appears to have slightly worse performance (although we would really need to use statistical analysis to be sure) :

05

So far, we have been using a dict to provide fast access to elements by their value, to avoid iterating through lists to get to a particular index. However, NumPy provides fast arrays in memory (like classic C arrays : $startLocation + index \times dataLength$) along with optimized vector math operations. Experience has shown that simply converting to NumPy arrays and hoping the fast array access will speed things up is a loss - the conversion between NumPy and regular Python data introduces too much overhead. However, NumPy shines when vector operations can be included, because we then get parallelization via SIMD and multithreading. It turns out that we can get about another 2x improvement :

06

SymPy also has a routine to return all the primes in a range of two values - so it's trivial to write the code to sum them up. However, the result is not very fast :

07

We also have pyprimesieve, which is our fastest prime generator so far. It happens to include a built-in function to compute the sum of the primes to a limit - all we need to do is return the resulting value. It runs so fast we can easily go beyond the problem limit and still be orders of magnitude faster :