Sample Data - JoshLoecker/MAPT GitHub Wiki
A series of sample data exists at /home/joshua.loecker/project/examples. It contains the pipeline, 2 fast5 files, and the expected output under the results folder
If you would like to test your configuration against a series of known output, this is the place to do it.
Additional Links
Quick Analysis
Quick comparisons can be made here. Below is a list of folder sizes from the data located at /project/brookings_minion/examples/results
To get your own folder sizes, run du -sh /path/to/folder
/results/...................361M
/results/alignment..........52K
/results/barcode............20M
/results/basecall...........23M
/results/count_reads........16K
/results/filter.............11M
/results/id_reads...........168K
/results/isONclust..........186M
/results/LowClusterReads....31M
/results/spoa...............20K
/results/trim...............15M
/results/visuals............35M
Running Your Own Tests
To test your own set up of the pipeline against a known outcome, perform the following
Note: Once the run has started, it should only take ~15-20 minutes to complete (usually less than this)
-
Create a temporary directory in the
90daydatadirectory:mkdir /90daydata/brookings_minion/$USER_scratch -
Navigate to your
MAPTdirectory -
Edit the following parameters in your configuration file
a.results: "/90daydata/brookings_minion/$USER_scratch
b.basecall_files: "/project/brookings_minion/examples/fast5"
c.reference_database: "/project/brookings_minion/reference_databases/zymogen_reference.fasta"
d. The remaining parameters can remain as-is -
Request a GPU node with:
srun --pty --partition gpu-low --time 01:00:00 --ntasks 72 --nodes 1 /bin/bash
a. This gives us 1 hour with all available threads of the GPU. Depending on availability, you may need to check back later for GPU access
b.srun: Call srun
c.--pty: When we enter the node, bring all stdout/stderr to the terminal window
d.--partition gpu-low: Request the GPU node. A list of nodes can be seen here
e.--time 01:00:00: Request one hour (format is in hh:mm:ss)
f.--ntasks 72: The number of threads per node to request
g.--nodes 1: The number of nodes to request. The GPU has 2 nodes, at 36 threads each
h./bin/bash: The command to execute withsrun. This is what gives us control of the node -
Activate the conda environment:
conda activate /project/brookings_minion/conda-envs/mapt_pipeline -
Execute the pipeline:
snakemake --cores 1 --use-singularity --singularity-args="--nv"
a.snakemake: Call snakemake
b.--cores all: Use all cores available (maximum is--ntasks*--nodesfrom Step 4
c.--use-singularity: Use singularity. This is required for Guppy
d.--singularity-args="--nv": Allow snakemake to pass the GPU into the singularity container. This is required for Guppy's GPU basecalling
A few SLURM scripts do exist, under the /project/brookings_minion/examples/slurm directory. If you are unfamiliar with SLURM,
this may be an opportunity to write a SLURM script and check it against a known file.
Use the two scripts, Starting Snakemake in SLURM and Activate Conda in SLURM, to get started with writing your own