reference - spiralgenetics/biograph GitHub Wiki

Spiral Genetics provides a number of popular references already converted to the BioGraph format, including hs37d5 and grch38. You can download them directly from AWS S3 at s3://spiral-public/references/.

To prepare your own FASTA reference for use with BioGraph, use the biograph reference command. The specified reference directory will be created and it will contain the new reference and database files.

The FASTA should already be indexed by BWA. If it is not indexed, download BWA and run bwa index your_input.fasta before building the BioGraph reference.

To convert your indexed FASTA to a BioGraph reference:

$ biograph reference --in /path/to/hs37d5.fasta --refdir hs37d5
Preparing source fasta
Building reference
[======================================================================] 100.00 %
Results saved to hs37d5
Cleaning up...

The resulting reference directory contains the original FASTA file (now called source.fasta), a reference.fasta broken up by supercontigs, and various indices.

$ ls hs37d5/
karyotype.json   reference.ref     source.fasta.ann  source.fasta.sa
reference.bwt    source.fasta      source.fasta.bwt
reference.fasta  source.fasta.amb  source.fasta.pac

Creating a human reference takes about a half hour and does not use significant CPU, memory, or temporary space.

Use the --help switch to see all available options.

(bg7)$ biograph reference --help
usage: reference [-h] --in IN --refdir REFDIR [-f] [--min-n-run MIN_N_RUN]

Prepare a fasta for use with BioGraph. The specified reference directory will
be created and will contain the new reference and database files.

optional arguments:
  -h, --help            show this help message and exit
  --in IN               Input reference fasta or fasta.gz
  --refdir REFDIR       Output reference directory
  -f, --force           Overwrite existing refdir
  --min-n-run MIN_N_RUN
                        Any runs of 'N's smaller than this long are replaced
                        with the preceding base (=50)