3 Available flags - Bio2Byte/simsapiper GitHub Wiki

Environment flags

Flag	Function	Default	Recommendation
-resume	Retry the last run, no rerun of completed jobs -resume [hash] to retry specific run
-profile standard	Local execution Use multiple profiles: -profile server,withconda
-profile server	Linux server execution
-profile hpc	HPC execution using SLURM
-profile withdocker	Dependencies via docker container
-profile withapptainer	Dependencies via apptainer images
-profile withconda	Dependencies via conda (except T-Coffee)
--condaEnvPath	Full path to conda environment (if –profile withconda)	false	create with .yml file for ARM-Apple (-profile standard)/ Linux (-profile server) automatically
--apptainerPath	Full path to apptainer/singularity cache directory	"$(pwd)"

Execution presets

Flag	Function	Default	Recommendation
--magic	Launch a run with recommended settings for all parameters	false
--minimagic	Launch a run with recommended settings for small datasets (<50 sequences)	false
--localmagic	Launch a run with recommended settings for local structure prediction	false

Data input and preprocessing

Flag	Function	Default	Recommendation
--data	Full path to data directory	$(pwd)/data
--structures	Path to structure files directory	--data/structures
--dsspPath	Path to dssp files directory	--data/dssp
--seqs	Path to sequence files directory	--data/seqs
--seqFormat	Input sequence format according to biopython formats	fasta
--seqQC	Ignore sequences with % non-standard amino acids	5
--seqLen	Ignore sequences shorter than X characters	50
--dropSimilar	Collapse sequences with % sequence identity	false	90
--favoriteSeqs	Select sequence labels that need to stay in the alignment	false	"SeqLabel1,SeqLabel2"
--stopHyperconserved	Skip input file if it contains only identical sequences	false
--outFolder	Set directory name and full path for output files	$(pwd)/results/ simsa_time_of_execution
--outName	Set final MSA file name	finalmsa
--createSubsets	Creates subsets of maximally % sequence identity	false	30
--minSubsetID	Sets minimal % sequence identity for sequences to be in a subset	20	"min" to collate small CD-Hit clusters
--maxSubsetSize	Sets maximal number of sequences in a subset	true	<400AA: --maxSubsetSize 100, >400AA: --maxSubsetSize 50
--useSubsets	User provides multiple sequence files corresponding to subsets Provide sequences not fitting any subset in a file containing 'orphan' in filename	false

Structure collection

Flag	Function	Default	Recommendation
--retrieve	Retrieve protein structure models from AFDB	false
--model	Predict protein structure models with ESM Atlas	false
--localModel	Predict protein structure models with local ESMFold for n hours (!GPUs needed) increase n+1 for every 100 seqs to model	false	1
--strucQC	Maximal % of sequences not matched to a 3D structure	5

Alignment and postprocessing

Flag	Function	Default	Recommendation
--tcoffeeParams	Additional parameters for Tcoffee	false	"--help"
--mafftParams	Additional parameters for MAFFT	false	"--localpair --maxiterate 100"
--dssp	Map DSSP code to alignment	false
--squeeze	Squeeze alignment towards conserved 2nd structure categories	false
--squeezePerc	Set minimal occurence % of anchor element in MSA	80
--tree	Calculate phylogenetic tree from SIMSA with IQ-TREE2 Add -B for ultrafast bootstrap or any other parameters	false	"-B 10000"
--reorder	Order final MSA by input file order	false
--convertMSA	Covert final MSA file from fasta to selected file format	false	"clustal"

⚠️ GitHub.com Fallback ⚠️