BAM Processing with samtools - igheyas/Bioinformatics GitHub Wiki

BAM Processing with samtools

Once you have your SAM file, you can use samtools to convert, sort, index and gather basic statistics on your alignments.

1. Convert SAM → BAM

samtools view -bS aln.bwa.sam > aln.bwa.bam

Output:

2. Sort BAM

samtools sort aln.bwa.bam -o aln.bwa.sorted.bam

-Sorts reads by reference position, which is required for indexing and many downstream tools.

Output:

OR

# -@ 8: use 8 threads; -o: output file
samtools sort -@ 8 -o aln.bwa.sorted.bam aln.bwa.bam

Output:

3. Index BAM

samtools index aln.bwa.sorted.bam

-Creates an index file (.bai) to allow fast random access (e.g. by genomic coordinate).

4. Quick Statistics

-Flag statistics

samtools flagstat aln.bwa.sorted.bam

Reports total reads, mapped reads, duplicates, etc.

Output:

500000 + 0 in total (QC-passed reads + QC-failed reads) 500000 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 500000 + 0 mapped (100.00% : N/A) 500000 + 0 primary mapped (100.00% : N/A) 500000 + 0 paired in sequencing 250000 + 0 read1 250000 + 0 read2 500000 + 0 properly paired (100.00% : N/A) 500000 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ≥5)

-Index statistics

samtools idxstats aln.bwa.sorted.bam

Shows per-reference (chromosome/contig) read counts and lengths.

Output: