3 Available flags - Bio2Byte/simsapiper GitHub Wiki

Environment flags

Flag Function Default Recommendation
-resume Retry the last run, no rerun of completed jobs
-resume [hash] to retry specific run
-profile standard Local execution
Use multiple profiles: -profile server,withconda
-profile server Linux server execution
-profile hpc HPC execution using SLURM
-profile withdocker Dependencies via docker container
-profile withapptainer Dependencies via apptainer images
-profile withconda Dependencies via conda (except T-Coffee)
--condaEnvPath Full path to conda environment (if –profile withconda) false create with .yml file for
ARM-Apple (-profile standard)/
Linux (-profile server) automatically
--apptainerPath Full path to apptainer/singularity cache directory "$(pwd)"

Execution presets

Flag Function Default Recommendation
--magic Launch a run with recommended settings for all parameters false
--minimagic Launch a run with recommended settings for small datasets (<50 sequences) false
--localmagic Launch a run with recommended settings for local structure prediction false

Data input and preprocessing

Flag Function Default Recommendation
--data Full path to data directory $(pwd)/data
--structures Path to structure files directory --data/structures
--dsspPath Path to dssp files directory --data/dssp
--seqs Path to sequence files directory --data/seqs
--seqFormat Input sequence format according to biopython formats fasta
--seqQC Ignore sequences with % non-standard amino acids 5
--seqLen Ignore sequences shorter than X characters 50
--dropSimilar Collapse sequences with % sequence identity false 90
--favoriteSeqs Select sequence labels that need to stay in the alignment false "SeqLabel1,SeqLabel2"
--stopHyperconserved Skip input file if it contains only identical sequences false
--outFolder Set directory name and full path for output files $(pwd)/results/
simsa_time_of_execution
--outName Set final MSA file name finalmsa
--createSubsets Creates subsets of maximally % sequence identity false 30
--minSubsetID Sets minimal % sequence identity for sequences to be in a subset 20 "min" to collate small
CD-Hit clusters
--maxSubsetSize Sets maximal number of sequences in a subset true <400AA: --maxSubsetSize 100,
>400AA: --maxSubsetSize 50
--useSubsets User provides multiple sequence files corresponding to subsets
Provide sequences not fitting any subset in a file containing 'orphan' in filename
false

Structure collection

Flag Function Default Recommendation
--retrieve Retrieve protein structure models from AFDB false
--model Predict protein structure models with ESM Atlas false
--localModel Predict protein structure models with local ESMFold for n hours (!GPUs needed)
increase n+1 for every 100 seqs to model
false 1
--strucQC Maximal % of sequences not matched to a 3D structure 5

Alignment and postprocessing

Flag Function Default Recommendation
--tcoffeeParams Additional parameters for Tcoffee false "--help"
--mafftParams Additional parameters for MAFFT false "--localpair --maxiterate 100"
--dssp Map DSSP code to alignment false
--squeeze Squeeze alignment towards conserved 2nd structure categories false
--squeezePerc Set minimal occurence % of anchor element in MSA 80
--tree Calculate phylogenetic tree from SIMSA with IQ-TREE2
Add -B for ultrafast bootstrap or any other parameters
false "-B 10000"
--reorder Order final MSA by input file order false
--convertMSA Covert final MSA file from fasta to selected file format false "clustal"
⚠️ **GitHub.com Fallback** ⚠️