Performance upgrades v1.2.3 ‐ v2.0.0 - Jusas/WatneyAstrometry GitHub Wiki

Solver performance (v2.0.0)

Solver performance (particularly blind solves) has increased significantly in v2.0.0 compared to v1.2.3 (see Performance Metrics for v1.2.3), which is why new benchmarks were run to visualize and analyze the differences. A new benchmarking tool was also written for this and future benchmarking needs, making it much easier to run benchmarks as we publish new improved versions. This is the ConsoleSolverBenchmarkTool found here

The solver CLI application has been benchmarked to get an idea of what sort of solve times you may expect when using it with different hardware. The CLI application was chosen as the one to benchmark as it's likely used more than the Web API.

The CLI application is mostly constrained by the CPU and file I/O. These factors come into play:

  • Every solve is a one-shot run of the program, which means same initialization tasks are done every time.
  • Quad databases are read from disk and not read into memory ahead of time, since that would mean reading the entire databases and would take considerable time. It would also be wasteful since solutions are in practice found before the entire database is read.
  • The raw math is done by the CPU, so the more cores and the faster the CPU you have, the faster that will go.

In many occasions, disk I/O is where the bottleneck lies. Hence why a lot of the optimizations come from simply limiting disk I/O operations in clever ways as much as possible (and this still has some untapped potential).

Note that with the Web API things go a bit differently; the same initialization is not done every time as once the process is running multiple images are fed to it, and RAM could be utilized more to skip disk I/O. Hence it will benefit from additional memory-based optimizations in the future (adding support for reading the DB to memory during app startup would likely bring significant performance improvements when there's enough RAM available). This said, these benchmarks should also reflect API performance gains as the solve process is identical.

Benchmarking setup

Developer's note: as time allows, more setups will likely be added to this list and charts updated.

What was benchmarked

The numbers specifically show the amount of time used in solving. This excludes the time used in image reads and star detection. These were left out as they were constant - no improvements were made on that part. You might also be using a different method for star extraction, e.g. perform star extraction outside Watney using SExtractor and feeding the XYLS to Watney from STDIN or a file. So, this benchmark only touches the raw solve times.

Hardware

These benchmarks were performed on an older gaming laptop, with the specs:

Acer Nitro 5 (2017)

  • CPU: 4x Intel Core i5-7300HQ @2.50GHz
  • Mem: 16 GiB
  • Disk: 500GB Western Digital WD Green SN3000 NVMe
  • OS: Kubuntu 25.10

Images

The benchmarking material was these 9 images (the dataset is available here):

Image Size Field radius (deg) Notes
bm_1_heart-nebula.jpg 3912 x 2956 0.40
bm_1_ic1795.fits 4656 x 3520 0.47 Solver run using settings which resulted in failure on purpose to test full sweep performance
bm_1_m31.fits 4656 x 3520 1.94
bm_1_m33_green.fits 2328 x 1760 1.94
bm_1_m81.png 1128 x 834 0.46
bm_1_ngc383.fits 4656 x 3520 0.47
bm_1_ngc925.fits 4656 x 3520 0.47
bm_1_ngc1491.fits 4656 x 3520 0.47
bm_1_ngc7331.fits 1392 x 1040 0.76

Parameters

The benchmarking parameters were as follows:

Parameter Values Description
Iterations 5 How many solves were run with the same settings, results were then averaged
Sampling variations 1, 2, 4, 6, 8, 12, 16 Different sampling values used
Radius variations (deg) (0.5 .. 16), (0.5 .. 8), (0.5 .. 4) The max/min radius limits in the solver arguments
Offset variations (0, 0), (0, 1) Value variations for --lower-density-offset and --higher-density-offset

Results

The full benchmark run included 9 images and multiple different combinations of parameters, but for the sake of brevity, only a selection of 5 images with a selection of 3 sampling parameters are represented here in graphs. The rest of the images indicate the same performance metrics more or less.

Overall results

The overall run time difference of the benchmarks between versions was:

Version .NET version Total run time
v1.2.3 .NET 6 02:43:15
v2.0.0 .NET 10 00:58:05

This is a pretty good indicator of the overall performance boost.

Brief image analysis

Images

  • bm_1_heart-nebula.jpg
  • bm_1_m81.png
  • bm_1_m31.fits
  • bm_1_ngc1491.fits
  • bm_1_ic1795.fits

Sampling

  • 1 (no sampling)
  • 4
  • 16

Higher density offset

  • Always 1 (includes more database quads in calculations, hence being slower, but generally brings more matches)

Min/max radius

  • Always 0.5 .. 16

Additional remarks

  • Depending on the image size, the Watney's built-in star detection would add 0.05 .. 0.38 seconds on top of the solve time on this setup.

bm_1_heart-nebula.jpg

images/v2_perf_1.png

As we can see, the difference in numbers here is significant: Without sampling the solve time has come down from ~9s to ~2s. The trend also starts to form that limited sampling value like 4 is the sweet spot. Interestingly the solve speed for sampling value of 16 is significantly up compared to no sampling. This reflects the changes made in calculations: it's now cheaper to do certain operations, even when sampling is not used. Optimizations at work here.

bm_1_m81.png

images/v2_perf_2.png

This graph tells pretty much the same story. Performance gains differ from image to image depending on several factors, but the general trend is the same.

bm_1_m31.fits

images/v2_perf_3.png

For this image the solution was always found quickly with the selected default settings, requiring very few search cycles. Interestingly the operations are now optimized enough to produce no distinguishable difference between different sampling settings in such a quick solve.

bm_1_ngc1491.fits

images/v2_perf_4.png

Steady improvement.

bm_1_ic1795.fits

images/v2_perf_5.png

This image was purposefully run through with settings that would not solve it (adding --lower-density-offset 1 and --higher-density-offset 1 would make it solve) so that the solver would run full sweep to see the numbers when maximum number of solve cycles was run. It tells the story pretty well, how significant the gains really are.

Conclusions

Three major factors contributed to the performance improvements:

  • Koen van Leeuwen's contributions in smarter calculations, using memory mapped files and reducing required file I/O (major)
  • Sorting and caching based shortcuts, and shorting out loops in code in a few critical places (moderate)
  • .NET 10 general improvements (very minor)

There are still avenues for improvements which have already been identified in regards to file I/O, which will require the re-structuring of the DB file format, but should still bring noticeable improvements on top of what we have now. These will be implemented in future versions.