Algorithm structure - lorenzo-arcioni/HPC-T-Annotator GitHub Wiki
The operations of the application, at a high level, can be summarised as follows: the master node, after analysing the input file, generates dynamic software according to the characteristics of the input, which will then be executed by the slaves nodes. Once the slaves are started, a further software will manage the control of the entire application; taking care of intervening when all the nodes have completed their computation and merging all the partial results obtained, as well as carrying out tests that, if passed, guarantee the correctness of the calculation. The control software will carry out statistics on the time taken by each node (actual and real) and on the general calculation time.
The scattering algorithm used in the application is the Cyclic-Distribution, which consists of cyclically assigning the sequences within the input file. Therefore, let N be the number of processes, process 0 will have assigned sequences: sequence 0, sequence N, sequence 2N, and so on. Similarly, process 1 will have assigned sequences: sequence 1, sequence N+1, sequence 2N+1, and so on. This ensures adequate load balancing since sequences in multi-FASTA files are usually ordered from longest to shortest. With traditional assignment, process 0 would have been assigned the first sequences (the largest ones in the file), while the last process would have been assigned the last ones, resulting in significant overload of the first process, which would become the bottleneck of the application.