1 QuickStart - Bio2Byte/simsapiper GitHub Wiki
Install requirements
- Nextflow
- Singularity/Apptainer or Docker
- Sufficient amount of scratch space and RAM (300 Sequences of 400 residues with 30% sequence identity need 30GB disk space and 32GB RAM)
- Copy of this repository
git clone https://github.com/Bio2Byte/simsapiper.git
Prepare data
Use directory toy_example
to test installation.
SIMSAPiper will automatically recognize directories called data
if none is specified.
The directory contains:
- Subdirectory
seqs
with fasta-formatted protein sequences - Optional: subdirectory
structures
with 3D protein structure models
Launch pipeline using command line
Enable recommended settings using --magic
nextflow run simsapiper.nf -profile server,withsingularity --data $PWD/toy_example/data --magic
or use
chmod +x magic_align.sh
./magic_align.sh
This file can also be double-clicked to run the toy_example dataset.
Use absolute files paths (/Users/me/workspace/simsapiper/toy_example/data
).
By default most flags are set to False. Adding a flag to the command line will set it to True and activate it. Some flags can carry additional information, such as percentages or filenames. The complete list can be found below.
--magic flag is equivalent to
nextflow run simsapiper.nf
-profile server,withsingularity
--seqFormat fasta
--seqQC 5
--dropSimilar 90
--outFolder $PWD/simsa_time_of_execution
--outName "magicMsa"
--minSubsetID "min"
--createSubsets 30
--retrieve
--model
--strucQC 5
--dssp
--squeeze "H,E"
--squeezePerc 80
--reorder
--data $PWD/toy_example/data
Other presets:
--minimagic to align small datasets (<50 sequences)
Note that to align less then 10 sequences it is necessary to run this profile, as our preprocessing does not work for such small sequence numbers.
nextflow run simsapiper.nf
-profile server,withsingularity
--seqFormat fasta
--seqQC 10
--outFolder $PWD/simsa_time_of_execution
--outName "minimagicMSA"
--useSubsets
--retrieve
--model
--strucQC 5
--dssp
--squeeze "H,E"
--squeezePerc 60
--reorder
--data $PWD/toy_example/data
--localmagic to align datasets with predicting 3D structures locally using ESMfold
This requires a GPU, and most likely an HPC. Structures with more then 800 residues require >16GB GPU RAM.
nextflow run simsapiper.nf
-profile server,withsingularity
--seqFormat fasta
--seqQC 5
--dropSimilar 90
--outFolder $PWD/simsa_time_of_execution
--outName "magicMsa"
--minSubsetID "min"
--createSubsets 30
--localModel 1
--strucQC 5
--dssp
--squeeze "H,E"
--squeezePerc 80
--reorder
--data $PWD/toy_example/data