Running - lorenzo-arcioni/HPC-T-Annotator GitHub Wiki
There are several options available. Remember that it is absolutely necessary that the paths provided as input to the main.sh script are absolute paths and not relative paths.
-
-i <file.fasta>
Path to the query input file in multi-FASTA format. -
-d <database>
Path to the database file. -
-b <binary>
Path to the binary file (Diamond or BLAST). -
-T <function>
blastp and blastx are available. -
-p <number_of_processes>
Number of processes to split the computation into. Consider that the higher the number of processes, the more time will be needed for pre-processing. A range from 5 to 500 is recommended. Note that the number of processes must never exceed the number of sequences. -
-t <threads>
The number of threads that each process can use.
-
-h
Shows the usage of the software. -
-f <6_BLAST_outformat>
The 6th tabular BLAST outformat, for example this is the default outformat:Make sure that the required information is present in the reference database.-f "6 qseqid sseqid slen qstart qend length mismatch gapopen gaps sseq"
-
-D
If we want to use the Diamond software. -
--slurm
Use this option only if the computation will be run on a cluster with Slurm as the workload manager.
It is of course possible to give further options to the BLAST and Diamond software, this is done via prepared files located in the Bases directory. Simply add the options in the respective file, depending on which tool you are using BLAST or Diamond.
blast_additional_options.txt
diamond_additional_options.txt
For example in the diamond additional options file we can insert:
--ultra-sensitive --quiet
It is mandatory to enter the options all on one line.
Regarding the execution of HPC-T-Annotator on an HPC cluster with SLURM as the workload manager, the user must ensure to properly configure all the configuration files that reside in the Bases folder, namely:
slurm_controlscript_base.txt
slurm_partial_script_base.txt
slurm_start_base.txt
Remember to properly configure these files, as failure to do so may compromise the entire execution.
Please note that for execution through the SLURM workload manager, it is necessary to provide the --slurm option in the command line when running the main.sh script.
After cloning the repository, you can proceed as follows: perform the code generation phase, upload (if necessary) the generated TAR package to the HPC machine, and then start the computation.
There are two methods for generating the scripts.
A command-line example using the diamond suite.
./main.sh -i /home/user/assembly/slow_fast_degs_hs.fasta -b /home/user/bin/diamond -T blastx -t 48 -D -d /home/user/NR/nr.dmnd -p 50
In this case, we will divide the computation (and the input file) into 50 parts that will be processed simultaneously (with 48 threads each). In the end, the outputs of the 50 jobs will be combined into a single file.
An other example using the BLAST
./main.sh -i ../project/assembly/slow_fast_degs_hs.fasta -b /home/blast/blastx -T blastx -t 48 -d /home/user/DB/nr -p 100
In this case we have split the computation into 100 jobs using the BLAST suite.
So we extract the generated code.
tar -zxf hpc-t-annotator.tar && rm hpc-t-annotator.tar
Once this is done, you have everything you need to manage and start the computation, so all you have to do is run (if you are on Slurm):
sbatch start.sh
At the end of the calculation, the output will be in the tmp directory with the name final_blast.tsv.