Prob 0009 ‐ Special Pythagorean Triplet - maccergit/Euler GitHub Wiki

This is a rich area for researching possible approaches, as there are many possible ways to solve this other than the direct approach. Even the direct approach has optimizations based on geometry. Among the various approaches, none are obviously better than the others, so will be interesting to see how things turn out - almost all of them require some sort of trial and error approach, and the optimizations involve reducing the number of trials.

01

For the direct approach, we can still do some optimizations.

  • We know that the limit is the sum of the three sides of the triangle, which means it is the perimeter of the triangle. As a result, we also know that $c = limit - a - b$. We also know that $c < a + b$, because no side of a triangle can be longer than the sum of the other two sides (visualize a triangle where two sides are almost in line with each other - the other side is also almost in line with them, and is almost as long as the the first two sides laid end to end). Of course, $a, b, c > 0$, and $c > \frac{limit}{3}$ because it's the longest side, and the shortest it can be while still being larger than the other sides is an equilateral triangle. These then provide another constraint :
\displaystyle \begin{align}
  c &< a + b \tag{1} \\
  limit - a - b &< a + b, \text{ (add a and b to each side)} \tag{2} \\
  limit &< 2a + 2b \tag{3} \\
  \frac{limit}{2} &< a + b \tag{4}
\end{align}

We also know that $a < b < c$, as none of the sides can be equal - if $a = b$, and they are integers, then $c$ will be irrational, and thus a non-integer.

We can then let the longest side "c" range in the integer values from $\frac{limit}{3}..limit$, and then let the next longest side "b" range in the integer values from $\frac{limit - c}{2}..(limit - c)$. Given two sides, we automatically know the third, and thus have a trial triplet - but it may not be a Pythagorean triplet, so we then test to see if $a^2 + b^2 = c^2$ - once we have one, we can stop :

While this is in the millisecond range, and is plenty fast to solve the problem, it also appears to have approximately quadratic behavior - so this would not scale well to much larger limits : 10000 runs in 3.5 seconds, and I extrapolate that 100000 would take almost 6 minutes, and 1 million would take almost 10 hours.

02

Another easy optimization is the realization that of "a" and "b", one must be odd and the other must be even. Also, we can iterate over the allowed ranges for "a" and "b", which are smaller than the allowed ranges for "c". Finally, the filtered generator comprehension syntax used in the solution above does not stop when the solution is found, but instead runs the loops to completion and then returns the first good result. However, including the trial check in the loops allows for short-circuiting the loops once a solution is found. This runs much faster :

However, we only changed the multiplier of the quadratic behavior, and I predict it would still take a couple hours to process with a limit of a million.

03

If we use the perimeter formula of $c = limit - a - b$, plug it into the Pythagorean formula to eliminate "c", and then solve for "b", we get an equation for "b" in terms of "a" and the limit :

\displaystyle b = limit + \frac{limit^2}{2(a - limit)}

This allows us to iterate only over the range for "a", and then compute "b" and "c". If we use integer division, we can then run the same trial to verify the result is a Pythagorean triplet, and stop once we find the solution. This is orders of magnitude faster :

Further investigation reveals that not only does this run the original problem in µsec times, but it appears to scale in a linear fashion - so even a limit of 1 million runs in 70 milliseconds. However, we are still generating triplets that are not Pythagorean triplets, and thus need to test for this.

04

What if we could generate all primitive Pythagorean triplets? Then, we would just need to see which ones have the correct perimeter to get the solution. A little research reveals that there are many ways to generate Pythagorean triplets. One way uses "p" and "q", natural numbers that cannot both be odd. Given this, then :

\displaystyle \begin{align}
  a &= 2pq \\
  b &= p^2 - q^2 \\
  c &= p^2 + q^2
\end{align}

While fairly fast, it's not as fast the the fastest solution we have see so far - and the scale is quadratic (as expected, since we have nested loops), so the time rapidly grows - again, it would take a couple hours for a 1 million limit.

05

A variant of the approach above is the following - for natural numbers "p" and "q", both odd, $p > q > 0$ :

\displaystyle \begin{align}
  a &= pq \\
  b &= \frac{p^2 - q^2}{2} \\
  c &= \frac{p^2 + q^2}{2}
\end{align}

This might initially look like just a scaled version of the previous approach - but note that both numbers are odd, rather than being a mix of odd/even. This means we can iterate over every other number, rather than every number :

The large spike in the middle results in a graph scale that hides the difference between the approaches. However, deeper investigation reveals that the second approach is about 6x faster than the first (20 µsec vs. 125 µsec). For 1000, this is the fastest result so far. Unfortunately, it still has quadratic behavior, with 10000 taking about 1 second, and I expect 1 million to take over 2 hours.

TODO : The discussion page for the problem carries this further - I have yet to implement these extended approaches.

06

Being a set of equations and constants over the set of integers, this problem is a "Diophantine equation". Exploring this can be a bit of a black hole and rapidly gets into advanced math. However, SymPy includes the ability to solve Diophantine equations, so let's give this a try. Working interactively on the Python command line with a trivial case of ${a, b, c} = {3, 4, 5}$, we know the perimeter is 12 :

>>> from sympy.solvers.diophantine import diophantine
>>> from sympy.abc import a, b, c
>>> eqs = diophantine(a + b + c - 12)
>>> eqs
{(t_0, t_0 + t_1, -2*t_0 - t_1 + 12)}

Note that we had to adjust the formula for perimeter to make it equal zero, as that is a requirement for the library. The result is a set of tuples of three equations in two unknowns, one equation for each of "a", "b", and "c". In this case, we get only one tuple as a solution - so we pull it out of the set to make it easier to access the pieces of the tuple, and we get :

>>> eq = list(eqs)[0] # Can't subscript sets, so turn into a list to get the first item
>>> eq
(t_0, t_0 + t_1, -2*t_0 - t_1 + 12)

Note that the result "eq" is displayed using $\LaTeX$ formatting (underscore for subscript) - if we run it through the formatter, we get $(t_0, t_0 + t_1, -2*t_0 - t_1 + 12)$, which corresponds to :

  \displaystyle \begin{align}
      & \tag{tuple index} \\
    a &= t_0 \tag{0} \\
    b &= t_0 + t_1 \tag{1} \\
    c &= 12 - 2t_0 - t_1 \tag{2}
  \end{align}

Of course, the problem is we don't know the values for $t_0$ or $t_1$. We then substitute the values into the Pythagorean equation to get solutions for $t_0$ and $t_1$ :

>>> results = diophantine(eq[0] ** 2 + eq[1] ** 2 - eq[2] ** 2)
>>> results
{(15, 21), (48, -34), (3, 1), (11, -71), (14, 34), (36, -21), (10, -34), (20, 1), (-12, 21), (-24, 34), (13, 71), (18, 6), (9, -21), (4, -1), (8, -14), (30, -14), (-60, 71), (-6, 14), (24, -6), (0, 6), (6, -6), (84, -71), (21, -1), (16, 14)}

Again, we had to adjust the equation to make it equal to zero, and instead of "a", "b", and "c", we used the equivalent values from the previously solved equation. We thus now have pairs of $t_0$ and $t_1$ that will produce Pythagorean triples that also produce a perimeter of 12. All we need to do now is substitute those values back in to get "a", "b", and "c". Note that there may be duplicates and negative values, so they will need to be filtered out to get the values we want. When substituting values for $t_0$ and $t_1$ in SymPy, you will need the defined symbols for them. $t_0$ is already known - it's the entry in "eq[0]". However, we don't have $t_1$ separated out as its own symbol. To get that, we can simply subtract "eq[0]" from "eq[1]", as that leaves only $t_1$ - which can then be bound to a specific value as a substitution :

>>> t0 = eq[0]
>>> t1 = eq[1] - eq[0]
>>> result_list = list(results) # again, we can't index a set - and we don't want the order to possibly change underneath us
>>> result = result_list[0]
>>> result
(15, 21)
>>> [x.subs({t0 : result[0], t1 : result[1]}) for x in eq]
[15, 36, -39]
>>> result = result_list[1]
>>> result
(48, -34)
>>> [x.subs({t0 : result[0], t1 : result[1]}) for x in eq]
[48, 14, -50]
>>> result = result_list[2]
>>> result
(3, 1)
>>> [x.subs({t0 : result[0], t1 : result[1]}) for x in eq]
[3, 4, 5]

The remaining challenge then is to write a program to automate all of this for an arbitrary limit :

While this may seem fairly slow (due to the overhead of solving equations symbolically), further experimentation reveals that solving for 1 million is still done in under a second.

07

TODO