Optimization of Izzo algorithm - poliastro/poliastro GitHub Wiki
First pass
line_profiler
Analysis with With the first version (5e80ac6) and a M = 1
case, 95 % of the time is spent iterating:
Total time: 0.000702905 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _find_xy at line 112
Line # Hits Time Per Hit % Time Line Contents
==============================================================
112 @profile
113 def _find_xy(ll, T, M, numiter, rtol):
114 """Computes all x, y for given number of revolutions.
115
116 """
117 # For abs(ll) == 1 the derivative is not continuous
118 1 10 10.0 0.4 assert abs(ll) < 1
119 1 3 3.0 0.1 assert T > 0 # Mistake on original paper
120
121 1 15 15.0 0.7 M_max = int(np.floor(T / pi))
122 1 20 20.0 0.9 T_00 = np.arccos(ll) + ll * np.sqrt(1 - ll ** 2) # T_xM
123
124 # Refine maximum number of revolutions if necessary
125 1 4 4.0 0.2 if T < T_00 + M_max * pi and M_max > 0:
126 T_min = _compute_T_min(ll, M_max)
127 if T < T_min:
128 M_max -= 1
129
130 # Check if a feasible solution exist for the given number of revolutions
131 # This departs from the original paper in that we do not compute all solutions
132 1 2 2.0 0.1 if M > M_max:
133 raise ValueError("No feasible solution, try M <= {:d}".format(M_max))
134
135 # Initial guess
136 1 32 32.0 1.4 for x_0 in _initial_guess(T, ll, M):
137 # Start Householder iterations from x_0 and find x, y
138 1 3 3.0 0.1 x = householder(_tof_equation, x_0, args=(T, ll, M), tol=rtol, maxiter=numiter,
139 1 2 2.0 0.1 fprime=_tof_equation_p, fprime2=_tof_equation_pp,
140 1 2157 2157.0 95.4 fprime3=_tof_equation_ppp)
141 1 10 10.0 0.4 y = _compute_y(x, ll)
142
143 1 2 2.0 0.1 yield x, y
It takes 20 evaluations of the objective function, 10 of the first derivative and 5 of the second derivative:
Total time: 0.000303245 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation at line 173
Line # Hits Time Per Hit % Time Line Contents
==============================================================
173 @profile
174 def _tof_equation(x, T, ll, M):
175 """Time of flight equation.
176
177 """
178 20 198 9.9 20.3 y = _compute_y(x, ll)
179 20 51 2.5 5.2 if M == 0 and np.sqrt(0.6) < x < np.sqrt(1.4):
180 eta = y - ll * x
181 S_1 = (1 - ll - x * eta) * .5
182 Q = 4 / 3 * hyp2f1(3, 1, 5 / 2, S_1)
183 T_ = (eta ** 3 * Q + 4 * ll * eta) * .5
184 else:
185 20 389 19.4 39.9 psi = _compute_psi(x, ll)
186 20 278 13.9 28.5 T_ = ((psi + M * pi) / np.sqrt(np.abs(1 - x ** 2)) - x + ll * y) / (1 - x ** 2)
187
188 20 59 3.0 6.1 return T_ - T
Total time: 0.000290493 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation_p at line 190
Line # Hits Time Per Hit % Time Line Contents
==============================================================
190 @profile
191 def _tof_equation_p(x, _, ll, M):
192 # TODO: What about derivatives when x approaches 1?
193 10 93 9.3 10.0 y = _compute_y(x, ll)
194 10 764 76.4 81.8 T = _tof_equation(x, 0.0, ll, M)
195 10 77 7.7 8.2 return (3 * T * x - 2 + 2 * ll ** 3 * x / y) / (1 - x ** 2)
Total time: 0.000302312 s
File: c:\users\jlcr\development\poliastro\src\poliastro\iod\izzo.py
Function: _tof_equation_pp at line 197
Line # Hits Time Per Hit % Time Line Contents
==============================================================
197 @profile
198 def _tof_equation_pp(x, _, ll, M):
199 5 45 9.0 4.6 y = _compute_y(x, ll)
200 5 363 72.6 37.3 T = _tof_equation(x, 0.0, ll, M)
201 5 511 102.2 52.6 dT = _tof_equation_p(x, _, ll, M)
202 5 53 10.6 5.5 return (3 * T + 5 * x * dT + 2 * (1 - ll ** 2) * ll ** 3 / y ** 3) / (1 - x ** 2)
Seems a bit too much for me for a Halley iteration scheme, perhaps there is some problem with the initial guesses but I am not sure.
Three ideas for optimizing:
- Accelerate the objective function and its derivatives with numba
- Lower the number of function evaluations (using a Householder iteration scheme)
- Optimizing costly operations (
sqrt
, divisions, repeated results)
Convergence and function evaluation analysis
After adding a print(x)
to the objective function these are the results:
[py35] C:\Users\jlcr\Development\poliastro\tests>kernprof.exe -l test_iod.py
-0.575030110669
-0.575030110669
-0.575030110669
-0.575030110669
-0.404426593027
-0.404426593027
-0.404426593027
-0.404426593027
-0.178580173437
-0.178580173437
-0.178580173437
-0.178580173437
-0.244168286816
-0.244168286816
-0.244168286816
-0.244168286816
-0.243620093697
-0.243620093697
-0.243620093697
-0.243620093697
Wrote profile results to test_iod.py.lprof
This is explained because the derivatives are also calling the function, usually with the same arguments. We can see that five iterations are needed for convergence. According to Izzo, Gooding "built upon these results and published a procedure achieving high precision in only three iterations for all geometries", so five iterations still looks like it is too much.
Ideas for optimization:
- Carefully analyze the convergence of the algorithm. If it is true that high precision can be achieved with three iterations, then either the initial guesses or the objective function and derivatives are wrong.
- Caching or saving the results. Notice that the
cache
parameter tonumba.jit
does not cache evaluations, but compilation. Also, for thousands or millions of evaluations this has consequences on memory usage. - Use an ad-hoc iteration scheme so we can save function evaluations in variables. This goes against reusing but my use case is fair: derivatives might need higher level evaluations.
Gooding:
It was arranged, in testing, that the relative error in
x
[...] after a pre-set number of iterations should be computed, as well as the relative error (residual) in the associated value ofT
; the smaller of these two relative errors, ϵ say, was recorded for each test case, and after three iterations it was found that e never exceeded 1e-13.
For completeness, it is remarked that when the process was reduced to two iterations, the maximum value of ϵ was about 2 · 1e-6 [...] For a single iteration, on the other hand, the maximum value of e was 4.3 · 1e-3 [...] Finally, the maximum value of ϵ prior to any iteration, i.e. due to the starter itself, was about 0.5, being associated with the possible over-estimate (in
x_0
) by 100 %, when φ is small.
For M = 0
, the results are better, converging after three iterations:
[py35] C:\Users\jlcr\Development\poliastro\tests>kernprof.exe -l test_iod.py
-0.58835115554
-0.58835115554
-0.58835115554
-0.58835115554
-0.622624710086
-0.622624710086
-0.622624710086
-0.622624710086
-0.622329168341
-0.622329168341
-0.622329168341
-0.622329168341
Wrote profile results to test_iod.py.lprof
Second pass
A closer look on the convergence for M = 0
helped me realize that the ϵ mentioned in (Gooding) and the tolerance check performed by scipy.optimize.newton
are not at all equivalent, hence they cannot be compared. The algorithm is working well for single revolution cases.
Apparently the Halley's method implemented in SciPy does not reach cubic convergence (see https://github.com/scipy/scipy/issues/5922 for discussion), and after implementing a custom method the expected rates were attained.
Third pass
The number of function evaluations has been reduced by implementing an ad-hoc iteration scheme, therefore we no longer reuse scipy.optimize.newton
calling convention.
Further opportunities of optimizing:
- Computing y and psi is expensive. We should minimize the number of function calls, and possibly reuse the computation of
(1 - x **2)
too. - ?
From here, I think it is time to start applying numba.jit
where useful.