full_pipeline - spiralgenetics/biograph GitHub Wiki
As outlined in the Quick Start, the biograph full_pipeline converts NGS reads into the BioGraph format and performs variant calling, genotyping, and filtering. This section outlines all of the full_pipeline parameters in detail.
Essential parameters
--biographor-b: the path to the BioGraph to be created--referenceor-r: the path to the BioGraph reference--tmp: path to temporary storage. Default: the value of$TMPDIR, or/tmp/if unset--threadsor-t: number of concurrent threads. Default: one thread per available CPU
Step-specific parameters
-
--model: the path to the BioGraph model file. This is only required if running thequal_classifierstep. -
--reads: the input reads in BAM, CRAM, or FASTQ format. This is only required if running thecreatestep. If your reads consist of multiple input files, you should consider using a custom pipeline script. To stream reads on STDIN, use--reads -. See Customizing the BioGraph Pipeline and create for more details.Are your input reads in CRAM format? If so, the reference that was used to create the CRAM must match the
--referenceparameter used for thecreatestep. While the resulting BioGraph is not tied to this reference, thefull_pipelinecommand will use it fordiscovery,coverage, andtruvari.If you wish to use a different reference for analysis, create a custom pipeline script. See Customizing the BioGraph Pipeline for details.
-
--create,--discovery,--coverage,--grm,--qual_classifier: These optional parameters are passed to their respective steps verbatim. They should be supplied as a single string. Common parameters such as--tmpor--referenceare automatically passed to the steps that require them and don't need to be included here:
(bg7)$ biograph full_pipeline --biograph my.bg --ref hs37d5/ \
--reads /path/to/my_reads_1.fq.gz \
--model /path/to/biograph_model-7.0.0.ml \
--create "--pair /path/to/my_reads_2.fq.gz --max-mem 100" \
--discovery "--bed /path/to/my_regions.bed" \
--coverage "--min-insert 500" \
--grm "-k 60" \
--qual_classifier "--filter 0.15 --lowqual_sv 0.3"
Skip stages with --stop and --resume
You can instruct full_pipeline to stop at any stage. To convert reads to BioGraph format without running further analysis, use --stop create. Since the qual_classifier step won't be run, there is no need to specify a model:
(bg7)$ biograph full_pipeline --biograph my.bg --ref hs37d5/ --stop create
You can also resume from any stage. This can be useful for performing additional analysis on an existing BioGraph, or changing a parameter for subsequent analysis.
For example, BioGraph files are reference agnostic. While a reference is used to speed up the create process, no reference information is stored in the BioGraph itself. The output BioGraph will be identical for a set of reads whether using hs37d5, GRCh38, or any other reference for the create step.
If you create a BioGraph using one reference and later decide to run an analysis on a different reference, there is no need to create another BioGraph from reads. Simply supply the path to the existing BioGraph and --resume discovery:
(bg7)$ biograph full_pipeline --biograph my.bg --ref grch38/ \
--resume discovery \
--model /path/to/biograph_model-7.0.0.ml
Additional full_pipeline parameters
--keep: keep intermediary files.--keep vcffor VCFs,--keep jlfor dataframes, or--keep allfor everything.--dry-run: generate a step-by-step script on STDOUT. See Customizing the BioGraph Pipeline for details.--force: overwrite any existing intermediary files (or the BioGraph if runningcreate).--helpor-h: show all available options
(bg7)$ biograph full_pipeline --help
usage: full_pipeline [-h] -b BG -r REF [--reads READS] [-m ML] [--tmp TMP]
[-t THREADS] [--keep {all,jl,vcf}] [--dry-run] [--force]
[--resume RESUME] [--stop STOP] [--create CREATE]
[--discovery DISCOVERY] [--coverage COVERAGE] [--grm GRM]
[--qual_classifier QUAL_CLASSIFIER]
Run the standard BioGraph pipeline for a single sample: create, discovery,
coverage, grm, qual_classifier
optional arguments:
-h, --help show this help message and exit
-b BG, --biograph BG BioGraph file (will be created if running the create
step)
-r REF, --reference REF
Reference genome folder, in BioGraph reference format
--reads READS Input reads for BioGraph create, if run (fastq, bam,
cram)
-m ML, --model ML BioGraph classifier model for qual_classifier, if run
--tmp TMP Temporary directory (/tmp)
-t THREADS, --threads THREADS
Number of threads to use (32)
--keep {all,jl,vcf} Keep intermediate dataframes/VCFs or all
--dry-run Run a preflight check, print all steps to be run, and
exit
--force Overwrite any existing intermediary files
Pipeline Arguments:
Control which section of the pipeline is run
--resume RESUME Step at which to resume the pipeline
--stop STOP Last step of the pipeline to run
Individual step arguments:
Specify any additional parameters to be passed to steps. Must be a single
"string"
--create CREATE create parameters
--discovery DISCOVERY
discovery parameters
--coverage COVERAGE coverage parameters
--grm GRM truvari grm parameters
--qual_classifier QUAL_CLASSIFIER
qual_classifier parameters