overview - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

  • Overview
    Read mapping is the process of taking millions of short (or long) sequencing reads and finding their best-matching location(s) in a reference genome. It lets us determine where each read originated, and forms the basis for virtually all downstream analyses—variant calling, expression quantification, coverage profiling, etc.

    The SAM (Sequence Alignment/Map) format is the community standard for storing those alignments in a human-readable, tab-delimited text file. Each line represents one read (or read pair) and records its mapping position, orientation, CIGAR string (how the read aligns to the reference), mapping quality, and optional tags (e.g. number of mismatches, sample metadata).

    BAM is simply the binary, compressed equivalent of SAM. By converting SAM → BAM you gain:

    • Much smaller files (10–20× smaller than plain text)
    • Fast random access (with a BAM index, you can quickly fetch alignments for any genomic region)
    • Robustness (binary checksums guard against corruption)

    Together, SAM/BAM form the backbone of read‐mapping workflows: you map your reads, generate a BAM, sort & index it, then feed it into variant callers, coverage tools, or genome browsers for visualization.