Performance upgrades v1.2.3 ‐ v2.0.0 - Jusas/WatneyAstrometry GitHub Wiki
Solver performance (v2.0.0)
Solver performance (particularly blind solves) has increased significantly in v2.0.0 compared to v1.2.3 (see Performance Metrics for v1.2.3), which is why new benchmarks were run to visualize and analyze the differences. A new benchmarking tool was also written for this and future benchmarking needs, making it much easier to run benchmarks as we publish new improved versions. This is the ConsoleSolverBenchmarkTool found here
The solver CLI application has been benchmarked to get an idea of what sort of solve times you may expect when using it with different hardware. The CLI application was chosen as the one to benchmark as it's likely used more than the Web API.
The CLI application is mostly constrained by the CPU and file I/O. These factors come into play:
- Every solve is a one-shot run of the program, which means same initialization tasks are done every time.
- Quad databases are read from disk and not read into memory ahead of time, since that would mean reading the entire databases and would take considerable time. It would also be wasteful since solutions are in practice found before the entire database is read.
- The raw math is done by the CPU, so the more cores and the faster the CPU you have, the faster that will go.
In many occasions, disk I/O is where the bottleneck lies. Hence why a lot of the optimizations come from simply limiting disk I/O operations in clever ways as much as possible (and this still has some untapped potential).
Note that with the Web API things go a bit differently; the same initialization is not done every time as once the process is running multiple images are fed to it, and RAM could be utilized more to skip disk I/O. Hence it will benefit from additional memory-based optimizations in the future (adding support for reading the DB to memory during app startup would likely bring significant performance improvements when there's enough RAM available). This said, these benchmarks should also reflect API performance gains as the solve process is identical.
Benchmarking setup
Developer's note: as time allows, more setups will likely be added to this list and charts updated.
What was benchmarked
The numbers specifically show the amount of time used in solving. This excludes the time used in image reads and star detection. These were left out as they were constant - no improvements were made on that part. You might also be using a different method for star extraction, e.g. perform star extraction outside Watney using SExtractor and feeding the XYLS to Watney from STDIN or a file. So, this benchmark only touches the raw solve times.
Hardware
These benchmarks were performed on an older gaming laptop, with the specs:
Acer Nitro 5 (2017)
- CPU: 4x Intel Core i5-7300HQ @2.50GHz
- Mem: 16 GiB
- Disk: 500GB Western Digital WD Green SN3000 NVMe
- OS: Kubuntu 25.10
Images
The benchmarking material was these 9 images (the dataset is available here):
| Image | Size | Field radius (deg) | Notes |
|---|---|---|---|
| bm_1_heart-nebula.jpg | 3912 x 2956 | 0.40 | |
| bm_1_ic1795.fits | 4656 x 3520 | 0.47 | Solver run using settings which resulted in failure on purpose to test full sweep performance |
| bm_1_m31.fits | 4656 x 3520 | 1.94 | |
| bm_1_m33_green.fits | 2328 x 1760 | 1.94 | |
| bm_1_m81.png | 1128 x 834 | 0.46 | |
| bm_1_ngc383.fits | 4656 x 3520 | 0.47 | |
| bm_1_ngc925.fits | 4656 x 3520 | 0.47 | |
| bm_1_ngc1491.fits | 4656 x 3520 | 0.47 | |
| bm_1_ngc7331.fits | 1392 x 1040 | 0.76 |
Parameters
The benchmarking parameters were as follows:
| Parameter | Values | Description |
|---|---|---|
| Iterations | 5 | How many solves were run with the same settings, results were then averaged |
| Sampling variations | 1, 2, 4, 6, 8, 12, 16 | Different sampling values used |
| Radius variations (deg) | (0.5 .. 16), (0.5 .. 8), (0.5 .. 4) | The max/min radius limits in the solver arguments |
| Offset variations | (0, 0), (0, 1) | Value variations for --lower-density-offset and --higher-density-offset |
Results
The full benchmark run included 9 images and multiple different combinations of parameters, but for the sake of brevity, only a selection of 5 images with a selection of 3 sampling parameters are represented here in graphs. The rest of the images indicate the same performance metrics more or less.
Overall results
The overall run time difference of the benchmarks between versions was:
| Version | .NET version | Total run time |
|---|---|---|
| v1.2.3 | .NET 6 | 02:43:15 |
| v2.0.0 | .NET 10 | 00:58:05 |
This is a pretty good indicator of the overall performance boost.
Brief image analysis
Images
- bm_1_heart-nebula.jpg
- bm_1_m81.png
- bm_1_m31.fits
- bm_1_ngc1491.fits
- bm_1_ic1795.fits
Sampling
- 1 (no sampling)
- 4
- 16
Higher density offset
- Always 1 (includes more database quads in calculations, hence being slower, but generally brings more matches)
Min/max radius
- Always 0.5 .. 16
Additional remarks
- Depending on the image size, the Watney's built-in star detection would add 0.05 .. 0.38 seconds on top of the solve time on this setup.
bm_1_heart-nebula.jpg
As we can see, the difference in numbers here is significant: Without sampling the solve time has come down from ~9s to ~2s. The trend also starts to form that limited sampling value like 4 is the sweet spot. Interestingly the solve speed for sampling value of 16 is significantly up compared to no sampling. This reflects the changes made in calculations: it's now cheaper to do certain operations, even when sampling is not used. Optimizations at work here.
bm_1_m81.png
This graph tells pretty much the same story. Performance gains differ from image to image depending on several factors, but the general trend is the same.
bm_1_m31.fits
For this image the solution was always found quickly with the selected default settings, requiring very few search cycles. Interestingly the operations are now optimized enough to produce no distinguishable difference between different sampling settings in such a quick solve.
bm_1_ngc1491.fits
Steady improvement.
bm_1_ic1795.fits
This image was purposefully run through with settings that would not solve it (adding --lower-density-offset 1 and --higher-density-offset 1 would make it solve) so that the solver would run full sweep to see the numbers when maximum number of solve cycles was run. It tells the story pretty well, how significant the gains really are.
Conclusions
Three major factors contributed to the performance improvements:
- Koen van Leeuwen's contributions in smarter calculations, using memory mapped files and reducing required file I/O (major)
- Sorting and caching based shortcuts, and shorting out loops in code in a few critical places (moderate)
- .NET 10 general improvements (very minor)
There are still avenues for improvements which have already been identified in regards to file I/O, which will require the re-structuring of the DB file format, but should still bring noticeable improvements on top of what we have now. These will be implemented in future versions.