Sample Data - JoshLoecker/MAPT GitHub Wiki
A series of sample data exists at /home/joshua.loecker/project/examples
. It contains the pipeline, 2 fast5 files, and the expected output under the results
folder
If you would like to test your configuration against a series of known output, this is the place to do it.
Additional Links
Quick Analysis
Quick comparisons can be made here. Below is a list of folder sizes from the data located at /project/brookings_minion/examples/results
To get your own folder sizes, run du -sh /path/to/folder
/results/...................361M
/results/alignment..........52K
/results/barcode............20M
/results/basecall...........23M
/results/count_reads........16K
/results/filter.............11M
/results/id_reads...........168K
/results/isONclust..........186M
/results/LowClusterReads....31M
/results/spoa...............20K
/results/trim...............15M
/results/visuals............35M
Running Your Own Tests
To test your own set up of the pipeline against a known outcome, perform the following
Note: Once the run has started, it should only take ~15-20 minutes to complete (usually less than this)
-
Create a temporary directory in the
90daydata
directory:mkdir /90daydata/brookings_minion/$USER_scratch
-
Navigate to your
MAPT
directory -
Edit the following parameters in your configuration file
a.results: "/90daydata/brookings_minion/$USER_scratch
b.basecall_files: "/project/brookings_minion/examples/fast5"
c.reference_database: "/project/brookings_minion/reference_databases/zymogen_reference.fasta"
d. The remaining parameters can remain as-is -
Request a GPU node with:
srun --pty --partition gpu-low --time 01:00:00 --ntasks 72 --nodes 1 /bin/bash
a. This gives us 1 hour with all available threads of the GPU. Depending on availability, you may need to check back later for GPU access
b.srun
: Call srun
c.--pty
: When we enter the node, bring all stdout/stderr to the terminal window
d.--partition gpu-low
: Request the GPU node. A list of nodes can be seen here
e.--time 01:00:00
: Request one hour (format is in hh:mm:ss)
f.--ntasks 72
: The number of threads per node to request
g.--nodes 1
: The number of nodes to request. The GPU has 2 nodes, at 36 threads each
h./bin/bash
: The command to execute withsrun
. This is what gives us control of the node -
Activate the conda environment:
conda activate /project/brookings_minion/conda-envs/mapt_pipeline
-
Execute the pipeline:
snakemake --cores 1 --use-singularity --singularity-args="--nv"
a.snakemake
: Call snakemake
b.--cores all
: Use all cores available (maximum is--ntasks
*--nodes
from Step 4
c.--use-singularity
: Use singularity. This is required for Guppy
d.--singularity-args="--nv"
: Allow snakemake to pass the GPU into the singularity container. This is required for Guppy's GPU basecalling
A few SLURM scripts do exist, under the /project/brookings_minion/examples/slurm
directory. If you are unfamiliar with SLURM,
this may be an opportunity to write a SLURM script and check it against a known file.
Use the two scripts, Starting Snakemake in SLURM and Activate Conda in SLURM, to get started with writing your own