3 Available flags - Bio2Byte/simsapiper GitHub Wiki
| Flag | Function | Default | Recommendation |
|---|---|---|---|
| -resume | Retry the last run, no rerun of completed jobs -resume [hash] to retry specific run |
||
| -profile standard | Local execution Use multiple profiles: -profile server,withconda |
||
| -profile server | Linux server execution | ||
| -profile hpc | HPC execution using SLURM | ||
| -profile withdocker | Dependencies via docker container | ||
| -profile withapptainer | Dependencies via apptainer images | ||
| -profile withconda | Dependencies via conda (except T-Coffee) | ||
| --condaEnvPath | Full path to conda environment (if –profile withconda) | false | create with .yml file for ARM-Apple (-profile standard)/ Linux (-profile server) automatically |
| --apptainerPath | Full path to apptainer/singularity cache directory | "$(pwd)" |
| Flag | Function | Default | Recommendation |
|---|---|---|---|
| --magic | Launch a run with recommended settings for all parameters | false | |
| --minimagic | Launch a run with recommended settings for small datasets (<50 sequences) | false | |
| --localmagic | Launch a run with recommended settings for local structure prediction | false |
| Flag | Function | Default | Recommendation |
|---|---|---|---|
| --data | Full path to data directory | $(pwd)/data | |
| --structures | Path to structure files directory | --data/structures | |
| --dsspPath | Path to dssp files directory | --data/dssp | |
| --seqs | Path to sequence files directory | --data/seqs | |
| --seqFormat | Input sequence format according to biopython formats | fasta | |
| --seqQC | Ignore sequences with % non-standard amino acids | 5 | |
| --seqLen | Ignore sequences shorter than X characters | 50 | |
| --dropSimilar | Collapse sequences with % sequence identity | false | 90 |
| --favoriteSeqs | Select sequence labels that need to stay in the alignment | false | "SeqLabel1,SeqLabel2" |
| --stopHyperconserved | Skip input file if it contains only identical sequences | false | |
| --outFolder | Set directory name and full path for output files | $(pwd)/results/ simsa_time_of_execution |
|
| --outName | Set final MSA file name | finalmsa | |
| --createSubsets | Creates subsets of maximally % sequence identity | false | 30 |
| --minSubsetID | Sets minimal % sequence identity for sequences to be in a subset | 20 | "min" to collate small CD-Hit clusters |
| --maxSubsetSize | Sets maximal number of sequences in a subset | true | <400AA: --maxSubsetSize 100, >400AA: --maxSubsetSize 50 |
| --useSubsets | User provides multiple sequence files corresponding to subsets Provide sequences not fitting any subset in a file containing 'orphan' in filename |
false |
| Flag | Function | Default | Recommendation |
|---|---|---|---|
| --retrieve | Retrieve protein structure models from AFDB | false | |
| --model | Predict protein structure models with ESM Atlas | false | |
| --localModel | Predict protein structure models with local ESMFold for n hours (!GPUs needed) increase n+1 for every 100 seqs to model |
false | 1 |
| --strucQC | Maximal % of sequences not matched to a 3D structure | 5 |
| Flag | Function | Default | Recommendation |
|---|---|---|---|
| --tcoffeeParams | Additional parameters for Tcoffee | false | "--help" |
| --mafftParams | Additional parameters for MAFFT | false | "--localpair --maxiterate 100" |
| --dssp | Map DSSP code to alignment | false | |
| --squeeze | Squeeze alignment towards conserved 2nd structure categories | false | |
| --squeezePerc | Set minimal occurence % of anchor element in MSA | 80 | |
| --tree | Calculate phylogenetic tree from SIMSA with IQ-TREE2 Add -B for ultrafast bootstrap or any other parameters |
false | "-B 10000" |
| --reorder | Order final MSA by input file order | false | |
| --convertMSA | Covert final MSA file from fasta to selected file format | false | "clustal" |