1.8 Open Reading Frame - alunga20/Concepts_of_Molecular_Biology GitHub Wiki

An open reading frame is a portion of a DNA molecule that, when translated into amino acids, contains no stop codons.
The genetic code reads DNA sequences in groups of three base pairs, which means that a double-stranded DNA molecule can read in any of six possible reading frames--three in the forward direction and three in the reverse. A long open reading frame is likely part of a gene.
These sequences – called open reading frames (ORF) – will be preceded by a start codon and uninterrupted by stop codons.
Open reading frames will typically consist of at least 100 codons (300 nucleotides).
While open reading frames may predict potential coding regions, they do not automatically guarantee the presence of a gene.
Some long and uninterrupted sequences of DNA may not actually be translated, while other short sequences may code protein.
Any particular stretch of DNA will have six reading frames that could potentially code for a functional protein.
mRNA is translated in codons (triplets of bases), meaning there are three potential reading frames for a given DNA sequence.
DNA is double-stranded and either strand could include a gene, meaning there are six reading frames in total (2 × 3).

To identify an open reading frame:

Locate a sequence corresponding to a start codon in order to determine the reading frame – this will be ATG (sense strand).
Read this sequence in base triplets until a stop codon is reached (TGA, TAG or TAA).
The longer the sequence, the more significant the likelihood that the sequence corresponds to an open reading frame.

Certain bioinformatic programs can automatically identify potential ORFs when provided with a candidate sequence.

Gene sequences are largely conserved – so if an ORF sequence is present in multiple genomes, it likely represents a gene.