Sample Data - JoshLoecker/MAPT GitHub Wiki

A series of sample data exists at /home/joshua.loecker/project/examples. It contains the pipeline, 2 fast5 files, and the expected output under the results folder
If you would like to test your configuration against a series of known output, this is the place to do it.

Additional Links

Quick Analysis

Quick comparisons can be made here. Below is a list of folder sizes from the data located at /project/brookings_minion/examples/results
To get your own folder sizes, run du -sh /path/to/folder

/results/...................361M
/results/alignment..........52K
/results/barcode............20M
/results/basecall...........23M
/results/count_reads........16K
/results/filter.............11M
/results/id_reads...........168K
/results/isONclust..........186M
/results/LowClusterReads....31M
/results/spoa...............20K
/results/trim...............15M
/results/visuals............35M

Running Your Own Tests

To test your own set up of the pipeline against a known outcome, perform the following
Note: Once the run has started, it should only take ~15-20 minutes to complete (usually less than this)

Create a temporary directory in the 90daydata directory: mkdir /90daydata/brookings_minion/$USER_scratch
Navigate to your MAPT directory
Edit the following parameters in your configuration file
a. results: "/90daydata/brookings_minion/$USER_scratch
b. basecall_files: "/project/brookings_minion/examples/fast5"
c. reference_database: "/project/brookings_minion/reference_databases/zymogen_reference.fasta"
d. The remaining parameters can remain as-is
Request a GPU node with: srun --pty --partition gpu-low --time 01:00:00 --ntasks 72 --nodes 1 /bin/bash
a. This gives us 1 hour with all available threads of the GPU. Depending on availability, you may need to check back later for GPU access
b. srun: Call srun
c. --pty: When we enter the node, bring all stdout/stderr to the terminal window
d. --partition gpu-low: Request the GPU node. A list of nodes can be seen here
e. --time 01:00:00: Request one hour (format is in hh:mm:ss)
f. --ntasks 72: The number of threads per node to request
g. --nodes 1: The number of nodes to request. The GPU has 2 nodes, at 36 threads each
h. /bin/bash: The command to execute with srun. This is what gives us control of the node
Activate the conda environment: conda activate /project/brookings_minion/conda-envs/mapt_pipeline
Execute the pipeline: snakemake --cores 1 --use-singularity --singularity-args="--nv"
a. snakemake: Call snakemake
b. --cores all: Use all cores available (maximum is --ntasks * --nodes from Step 4
c. --use-singularity: Use singularity. This is required for Guppy
d. --singularity-args="--nv": Allow snakemake to pass the GPU into the singularity container. This is required for Guppy's GPU basecalling

A few SLURM scripts do exist, under the /project/brookings_minion/examples/slurm directory. If you are unfamiliar with SLURM, this may be an opportunity to write a SLURM script and check it against a known file. Use the two scripts, Starting Snakemake in SLURM and Activate Conda in SLURM, to get started with writing your own

Return to Wiki Homepage
Continue to Expected Outcomes