Optimization of Izzo algorithm - poliastro/poliastro GitHub Wiki

First pass

Analysis with `line_profiler`

With the first version (5e80ac6) and a M = 1 case, 95 % of the time is spent iterating:

Total time: 0.000702905 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _find_xy at line 112

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   112                                           @profile
   113                                           def _find_xy(ll, T, M, numiter, rtol):
   114                                               """Computes all x, y for given number of revolutions.
   115                                           
   116                                               """
   117                                               # For abs(ll) == 1 the derivative is not continuous
   118         1           10     10.0      0.4      assert abs(ll) < 1
   119         1            3      3.0      0.1      assert T > 0  # Mistake on original paper
   120                                           
   121         1           15     15.0      0.7      M_max = int(np.floor(T / pi))
   122         1           20     20.0      0.9      T_00 = np.arccos(ll) + ll * np.sqrt(1 - ll ** 2)  # T_xM
   123                                           
   124                                               # Refine maximum number of revolutions if necessary
   125         1            4      4.0      0.2      if T < T_00 + M_max * pi and M_max > 0:
   126                                                   T_min = _compute_T_min(ll, M_max)
   127                                                   if T < T_min:
   128                                                       M_max -= 1
   129                                           
   130                                               # Check if a feasible solution exist for the given number of revolutions
   131                                               # This departs from the original paper in that we do not compute all solutions
   132         1            2      2.0      0.1      if M > M_max:
   133                                                   raise ValueError("No feasible solution, try M <= {:d}".format(M_max))
   134                                           
   135                                               # Initial guess
   136         1           32     32.0      1.4      for x_0 in _initial_guess(T, ll, M):
   137                                                   # Start Householder iterations from x_0 and find x, y
   138         1            3      3.0      0.1          x = householder(_tof_equation, x_0, args=(T, ll, M), tol=rtol, maxiter=numiter,
   139         1            2      2.0      0.1                          fprime=_tof_equation_p, fprime2=_tof_equation_pp,
   140         1         2157   2157.0     95.4                          fprime3=_tof_equation_ppp)
   141         1           10     10.0      0.4          y = _compute_y(x, ll)
   142                                           
   143         1            2      2.0      0.1          yield x, y

It takes 20 evaluations of the objective function, 10 of the first derivative and 5 of the second derivative:

Total time: 0.000303245 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation at line 173

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   173                                           @profile
   174                                           def _tof_equation(x, T, ll, M):
   175                                               """Time of flight equation.
   176                                           
   177                                               """
   178        20          198      9.9     20.3      y = _compute_y(x, ll)
   179        20           51      2.5      5.2      if M == 0 and np.sqrt(0.6) < x < np.sqrt(1.4):
   180                                                   eta = y - ll * x
   181                                                   S_1 = (1 - ll - x * eta) * .5
   182                                                   Q = 4 / 3 * hyp2f1(3, 1, 5 / 2, S_1)
   183                                                   T_ = (eta ** 3 * Q + 4 * ll * eta) * .5
   184                                               else:
   185        20          389     19.4     39.9          psi = _compute_psi(x, ll)
   186        20          278     13.9     28.5          T_ = ((psi + M * pi) / np.sqrt(np.abs(1 - x ** 2)) - x + ll * y) / (1 - x ** 2)
   187                                           
   188        20           59      3.0      6.1      return T_ - T

Total time: 0.000290493 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation_p at line 190

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   190                                           @profile
   191                                           def _tof_equation_p(x, _, ll, M):
   192                                               # TODO: What about derivatives when x approaches 1?
   193        10           93      9.3     10.0      y = _compute_y(x, ll)
   194        10          764     76.4     81.8      T = _tof_equation(x, 0.0, ll, M)
   195        10           77      7.7      8.2      return (3 * T * x - 2 + 2 * ll ** 3 * x / y) / (1 - x ** 2)

Total time: 0.000302312 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation_pp at line 197

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   197                                           @profile
   198                                           def _tof_equation_pp(x, _, ll, M):
   199         5           45      9.0      4.6      y = _compute_y(x, ll)
   200         5          363     72.6     37.3      T = _tof_equation(x, 0.0, ll, M)
   201         5          511    102.2     52.6      dT = _tof_equation_p(x, _, ll, M)
   202         5           53     10.6      5.5      return (3 * T + 5 * x * dT + 2 * (1 - ll ** 2) * ll ** 3 / y ** 3) / (1 - x ** 2)

Seems a bit too much for me for a Halley iteration scheme, perhaps there is some problem with the initial guesses but I am not sure.

Three ideas for optimizing:

Accelerate the objective function and its derivatives with numba
Lower the number of function evaluations (using a Householder iteration scheme)
Optimizing costly operations (sqrt, divisions, repeated results)

Convergence and function evaluation analysis

After adding a print(x) to the objective function these are the results:

[py35] C:\Users\jlcr\Development\poliastro\tests>kernprof.exe -l test_iod.py
-0.575030110669
-0.575030110669
-0.575030110669
-0.575030110669
-0.404426593027
-0.404426593027
-0.404426593027
-0.404426593027
-0.178580173437
-0.178580173437
-0.178580173437
-0.178580173437
-0.244168286816
-0.244168286816
-0.244168286816
-0.244168286816
-0.243620093697
-0.243620093697
-0.243620093697
-0.243620093697
Wrote profile results to test_iod.py.lprof

This is explained because the derivatives are also calling the function, usually with the same arguments. We can see that five iterations are needed for convergence. According to Izzo, Gooding "built upon these results and published a procedure achieving high precision in only three iterations for all geometries", so five iterations still looks like it is too much.

Ideas for optimization:

Carefully analyze the convergence of the algorithm. If it is true that high precision can be achieved with three iterations, then either the initial guesses or the objective function and derivatives are wrong.
Caching or saving the results. Notice that the cache parameter to numba.jit does not cache evaluations, but compilation. Also, for thousands or millions of evaluations this has consequences on memory usage.
Use an ad-hoc iteration scheme so we can save function evaluations in variables. This goes against reusing but my use case is fair: derivatives might need higher level evaluations.

Gooding:

It was arranged, in testing, that the relative error in x [...] after a pre-set number of iterations should be computed, as well as the relative error (residual) in the associated value of T; the smaller of these two relative errors, ϵ say, was recorded for each test case, and after three iterations it was found that e never exceeded 1e-13.

For completeness, it is remarked that when the process was reduced to two iterations, the maximum value of ϵ was about 2 · 1e-6 [...] For a single iteration, on the other hand, the maximum value of e was 4.3 · 1e-3 [...] Finally, the maximum value of ϵ prior to any iteration, i.e. due to the starter itself, was about 0.5, being associated with the possible over-estimate (in x_0) by 100 %, when φ is small.

For M = 0, the results are better, converging after three iterations:

[py35] C:\Users\jlcr\Development\poliastro\tests>kernprof.exe -l test_iod.py
-0.58835115554
-0.58835115554
-0.58835115554
-0.58835115554
-0.622624710086
-0.622624710086
-0.622624710086
-0.622624710086
-0.622329168341
-0.622329168341
-0.622329168341
-0.622329168341
Wrote profile results to test_iod.py.lprof

Second pass

A closer look on the convergence for M = 0 helped me realize that the ϵ mentioned in (Gooding) and the tolerance check performed by scipy.optimize.newton are not at all equivalent, hence they cannot be compared. The algorithm is working well for single revolution cases.

Apparently the Halley's method implemented in SciPy does not reach cubic convergence (see https://github.com/scipy/scipy/issues/5922 for discussion), and after implementing a custom method the expected rates were attained.

Third pass

The number of function evaluations has been reduced by implementing an ad-hoc iteration scheme, therefore we no longer reuse scipy.optimize.newton calling convention.

Further opportunities of optimizing:

Computing y and psi is expensive. We should minimize the number of function calls, and possibly reuse the computation of (1 - x **2) too.
?

From here, I think it is time to start applying numba.jit where useful.