BAM Processing with samtools - igheyas/Bioinformatics GitHub Wiki
BAM Processing with samtools
Once you have your SAM file, you can use samtools to convert, sort, index and gather basic statistics on your alignments.
1. Convert SAM → BAM
samtools view -bS aln.bwa.sam > aln.bwa.bam
Output:
2. Sort BAM
samtools sort aln.bwa.bam -o aln.bwa.sorted.bam
-Sorts reads by reference position, which is required for indexing and many downstream tools.
Output:
OR
# -@ 8: use 8 threads; -o: output file
samtools sort -@ 8 -o aln.bwa.sorted.bam aln.bwa.bam
Output:
3. Index BAM
samtools index aln.bwa.sorted.bam
-Creates an index file (.bai) to allow fast random access (e.g. by genomic coordinate).
4. Quick Statistics
-Flag statistics
samtools flagstat aln.bwa.sorted.bam
Reports total reads, mapped reads, duplicates, etc.
Output:
500000 + 0 in total (QC-passed reads + QC-failed reads) 500000 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 500000 + 0 mapped (100.00% : N/A) 500000 + 0 primary mapped (100.00% : N/A) 500000 + 0 paired in sequencing 250000 + 0 read1 250000 + 0 read2 500000 + 0 properly paired (100.00% : N/A) 500000 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ≥5)
-Index statistics
samtools idxstats aln.bwa.sorted.bam
Shows per-reference (chromosome/contig) read counts and lengths.