Reference Genome Preparation - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
-
Reference Genome Preparation
Before you can map reads, you need a FASTA file of your reference genome and an index for your chosen aligner.- Obtain or build a FASTA reference
-
Download a published genome (e.g. E. coli K-12 MG1655 RefSeq):
wget -O ref_genome.fa.gz \
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
gunzip ref_genome.fa.gz
- (Optional) Build your own reference by concatenating contigs:
cat contig1.fa contig2.fa … > ref_genome.fa
- Verify the FASTA header & sequence:
head -n5 ref_genome.fa
# >NC_000913.3 Escherichia coli K-12 MG1655, complete genome
-
Index the reference for fast mapping
Each aligner needs its own index format:-
BWA‐MEM
bwa index ref_genome.fa
Creates
.amb
,.ann
,.bwt
,.pac
,.sa
files. -
Bowtie2
bowtie2-build ref_genome.fa ref_bt2_index
Produces
ref_bt2_index.*
index files. -
Minimap2 (for long reads)
minimap2 -d ref_minimap2.mmi ref_genome.fa
Generates a single
ref_minimap2.mmi
index.
-
With the FASTA and its index in place, you’re ready to align your reads efficiently in the next step.