Sample Data - JoshLoecker/MAPT GitHub Wiki

A series of sample data exists at /home/joshua.loecker/project/examples. It contains the pipeline, 2 fast5 files, and the expected output under the results folder
If you would like to test your configuration against a series of known output, this is the place to do it.

Additional Links

  1. Expected Outcomes
  2. Potential Errors

Quick Analysis

Quick comparisons can be made here. Below is a list of folder sizes from the data located at /project/brookings_minion/examples/results
To get your own folder sizes, run du -sh /path/to/folder

/results/...................361M
/results/alignment..........52K
/results/barcode............20M
/results/basecall...........23M
/results/count_reads........16K
/results/filter.............11M
/results/id_reads...........168K
/results/isONclust..........186M
/results/LowClusterReads....31M
/results/spoa...............20K
/results/trim...............15M
/results/visuals............35M

Running Your Own Tests

To test your own set up of the pipeline against a known outcome, perform the following
Note: Once the run has started, it should only take ~15-20 minutes to complete (usually less than this)

  1. Create a temporary directory in the 90daydata directory: mkdir /90daydata/brookings_minion/$USER_scratch

  2. Navigate to your MAPT directory

  3. Edit the following parameters in your configuration file
    a. results: "/90daydata/brookings_minion/$USER_scratch
    b. basecall_files: "/project/brookings_minion/examples/fast5"
    c. reference_database: "/project/brookings_minion/reference_databases/zymogen_reference.fasta"
    d. The remaining parameters can remain as-is

  4. Request a GPU node with: srun --pty --partition gpu-low --time 01:00:00 --ntasks 72 --nodes 1 /bin/bash
    a. This gives us 1 hour with all available threads of the GPU. Depending on availability, you may need to check back later for GPU access
    b. srun: Call srun
    c. --pty: When we enter the node, bring all stdout/stderr to the terminal window
    d. --partition gpu-low: Request the GPU node. A list of nodes can be seen here
    e. --time 01:00:00: Request one hour (format is in hh:mm:ss)
    f. --ntasks 72: The number of threads per node to request
    g. --nodes 1: The number of nodes to request. The GPU has 2 nodes, at 36 threads each
    h. /bin/bash: The command to execute with srun. This is what gives us control of the node

  5. Activate the conda environment: conda activate /project/brookings_minion/conda-envs/mapt_pipeline

  6. Execute the pipeline: snakemake --cores 1 --use-singularity --singularity-args="--nv"
    a. snakemake: Call snakemake
    b. --cores all: Use all cores available (maximum is --ntasks * --nodes from Step 4
    c. --use-singularity: Use singularity. This is required for Guppy
    d. --singularity-args="--nv": Allow snakemake to pass the GPU into the singularity container. This is required for Guppy's GPU basecalling

A few SLURM scripts do exist, under the /project/brookings_minion/examples/slurm directory. If you are unfamiliar with SLURM, this may be an opportunity to write a SLURM script and check it against a known file. Use the two scripts, Starting Snakemake in SLURM and Activate Conda in SLURM, to get started with writing your own

Return to Wiki Homepage
Continue to Expected Outcomes