Benchmarks - lorenzo-arcioni/HPC-T-Annotator GitHub Wiki
Benchmark 1
A performance analysis was carried out on the entire application aimed at estimating the computation times on HPC platforms and highlighting the causes of slowdowns derived from such platforms. In particular, it emerged that, while achieving excellent performance on non-overloaded machines, performance tends to decrease when using the software on machines with high competition for job execution. The ideally achievable performance of the software (expected time) is still reported in the graphs to provide a clear idea of the execution times on dedicated machines or non-overloaded ones - the expected time is the time the entire algorithm would take if all processes started at the same time. As for performance analysis, we will use the transcriptome of Hyla Sarda (Mediterranean tree frog) as a reference and annotate it using the Diamond tool on the Swiss-Prot database, respectively using: 1, 10, 100, 200, 300 processes.
Number of processes | Expected Time (min) | Actual Time (min) | Speed-Up (Expected) | Speed-Up (Actual) |
---|---|---|---|---|
1 | 101.71 | 101.71 | 1.00 | 1.00 |
10 | 10.42 | 19.51 | 9.76 | 5.21 |
100 | 2.37 | 15.50 | 42.92 | 6.56 |
200 | 2.01 | 10.56 | 50.60 | 9.63 |
300 | 1.38 | 12.22 | 73.70 | 8.32 |
We notice how the overall execution time continues to decrease as the number of processes used increases; this data does not surprise us as we expect this trend until the number of processes equals the number of sequences (in this way, each process is assigned only one sequence). However, this is not always possible, as the number of sequences can be very high and it would not be possible to have such a high number of nodes on physical machines.
Benchmark 2
The following benchmark was executed using Diamond on the Swiss-Prot protein database, which includes taxonomic information and has a size of approximately 357MB (in the version optimized for the Diamond software). As benchmark references, the transcripts of the following organisms were analyzed, obtained using the blastx function of the Diamond tool.
Species name | Number of contigs | Number of processes | Expected time (min) | Actual time (min) |
---|---|---|---|---|
Altererythrobacter sp. | 220 | 1 | 1.41 | 1.41 |
Bombina pachypus | 190,619 | 1 | 20.59 | 20.59 |
Salamandra salamandra | 1,146,571 | 1 | 98.66 | 98.66 |
Hyla sarda | 1,295,741 | 1 | 101.71 | 101.71 |
Altererythrobacter sp. | 220 | 200 | 1.08 | 2.55 |
Bombina pachypus | 190,619 | 200 | 1.51 | 4.78 |
Salamandra salamandra | 1,146,571 | 200 | 1.97 | 5.31 |
Hyla sarda | 1,295,741 | 200 | 2.01 | 7.56 |
Speed-Up analysis
Speed-up is a metric used to measure the improvement in performance of a program or system when executing a specific task using multiple processors or resources, compared to when using a single processor or resource. It is defined as the ratio of the time taken to complete the task using a single processor or resource, to the time taken to complete the same task using multiple processors or resources.
As we can see, the software's speed-up is expected to increase as the number of processors available for computation increases. In other words, the more processes (jobs) we use to divide the workload of the annotation software, the greater performance gain we would have.