Hands On Exercise - igheyas/Bioinformatics GitHub Wiki
Hands-On Exercise
In this exercise we’ll take your simulated (or real) paired-end FASTQ reads through mapping, BAM generation & indexing, and basic QC/coverage checks.
1. Map reads to the reference
# align with BWA-MEM (8 threads)
bwa mem -t 8 ref_genome.fa \
raw_reads/simulated_small_R1.fastq \
raw_reads/simulated_small_R2.fastq \
> aln.sam
Output
2. Convert SAM → sorted BAM & index
# convert & sort
samtools view -bS aln.sam \
| samtools sort -@ 8 -o aln.sorted.bam
Output
# build BAM index
samtools index aln.sorted.bam
3. Evaluate mapping rates & read counts
# overall flag statistics (mapped %, duplicates, etc.)
samtools flagstat aln.sorted.bam > flagstat.txt
# per-chromosome read counts
samtools idxstats aln.sorted.bam > idxstats.txt
4. Compute coverage & depth
# per-base depth
samtools depth aln.sorted.bam > cov_per_base.txt
# inspect first 10 positions
head -n 10 cov_per_base.txt
Output:
# summary: mean & max coverage
awk '{sum+=$3; if($3>max) max=$3} END {print "mean:", sum/NR, "max:", max}' cov_per_base.txt
Output:
Optional: if you have Bedtools installed, you can also generate a BedGraph of coverage
bedtools genomecov -ibam aln.sorted.bam -bg \
> coverage.bedgraph
head -n 10 coverage.bedgraph