reference - spiralgenetics/biograph GitHub Wiki
Spiral Genetics provides a number of popular references already converted to the BioGraph format, including hs37d5 and grch38. You can download them directly from AWS S3 at s3://spiral-public/references/.
To prepare your own FASTA reference for use with BioGraph, use the biograph reference command. The specified reference directory will be created and it will contain the new reference and database files.
The FASTA should already be indexed by BWA. If it is not indexed, download BWA and run bwa index your_input.fasta before building the BioGraph reference.
To convert your indexed FASTA to a BioGraph reference:
$ biograph reference --in /path/to/hs37d5.fasta --refdir hs37d5
Preparing source fasta
Building reference
[======================================================================] 100.00 %
Results saved to hs37d5
Cleaning up...
The resulting reference directory contains the original FASTA file (now called source.fasta), a reference.fasta broken up by supercontigs, and various indices.
$ ls hs37d5/
karyotype.json reference.ref source.fasta.ann source.fasta.sa
reference.bwt source.fasta source.fasta.bwt
reference.fasta source.fasta.amb source.fasta.pac
Creating a human reference takes about a half hour and does not use significant CPU, memory, or temporary space.
Use the --help switch to see all available options.
(bg7)$ biograph reference --help
usage: reference [-h] --in IN --refdir REFDIR [-f] [--min-n-run MIN_N_RUN]
Prepare a fasta for use with BioGraph. The specified reference directory will
be created and will contain the new reference and database files.
optional arguments:
-h, --help show this help message and exit
--in IN Input reference fasta or fasta.gz
--refdir REFDIR Output reference directory
-f, --force Overwrite existing refdir
--min-n-run MIN_N_RUN
Any runs of 'N's smaller than this long are replaced
with the preceding base (=50)