Hands On Exercise - igheyas/Bioinformatics GitHub Wiki

Hands-On Exercise

In this exercise we’ll take your simulated (or real) paired-end FASTQ reads through mapping, BAM generation & indexing, and basic QC/coverage checks.

1. Map reads to the reference

# align with BWA-MEM (8 threads)
bwa mem -t 8 ref_genome.fa \
    raw_reads/simulated_small_R1.fastq \
    raw_reads/simulated_small_R2.fastq \
> aln.sam

Output

2. Convert SAM → sorted BAM & index

# convert & sort
samtools view -bS aln.sam \
  | samtools sort -@ 8 -o aln.sorted.bam

Output

# build BAM index
samtools index aln.sorted.bam

3. Evaluate mapping rates & read counts

# overall flag statistics (mapped %, duplicates, etc.)
samtools flagstat aln.sorted.bam > flagstat.txt

# per-chromosome read counts
samtools idxstats aln.sorted.bam > idxstats.txt

4. Compute coverage & depth

# per-base depth
samtools depth aln.sorted.bam > cov_per_base.txt

# inspect first 10 positions
head -n 10 cov_per_base.txt

Output:

# summary: mean & max coverage
awk '{sum+=$3; if($3>max) max=$3} END {print "mean:", sum/NR, "max:", max}' cov_per_base.txt

Output:

Optional: if you have Bedtools installed, you can also generate a BedGraph of coverage

bedtools genomecov -ibam aln.sorted.bam -bg \
  > coverage.bedgraph

head -n 10 coverage.bedgraph