Genbank - MetabolicEngineeringGroupCBMA/MetabolicEngineeringGroupCBMA.github.io GitHub Wiki

The GenBank format (GenBank Flat File Format) consists of an annotation section. and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS". The start of sequence section is marked by a line beginning with the word "ORIGIN" and the end of the section is marked by a line containing "//".

Sequence is expected to be in the standard IUPAC amino acid or nucleic acid codes.

Filename extensions ".gb" and ".genbank" are common for text files containing DNA sequences, while ".gp", ".genpept", and ".genpep" are common for text files containing protein sequences.

A DNA sequence:

LOCUS       AF068625                 200 bp    mRNA    linear   ROD 06-DEC-1999
DEFINITION  Mus musculus DNA cytosine-5 methyltransferase 3A (Dnmt3a) mRNA,
			complete cds.
ACCESSION   AF068625 REGION: 1..200
VERSION     AF068625.2  GI:6449467
KEYWORDS    .
SOURCE      Mus musculus (house mouse)
  ORGANISM  Mus musculus
			Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
			Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
			Sciurognathi; Muroidea; Muridae; Murinae; Mus.
REFERENCE   1  (bases 1 to 200)
  AUTHORS   Okano,M., Xie,S. and Li,E.
  TITLE     Cloning and characterization of a family of novel mammalian DNA
			(cytosine-5) methyltransferases
  JOURNAL   Nat. Genet. 19 (3), 219-220 (1998)
   PUBMED   9662389
REFERENCE   2  (bases 1 to 200)
  AUTHORS   Xie,S., Okano,M. and Li,E.
  TITLE     Direct Submission
  JOURNAL   Submitted (28-MAY-1998) CVRC, Mass. Gen. Hospital, 149 13th Street,
			Charlestown, MA 02129, USA
REFERENCE   3  (bases 1 to 200)
  AUTHORS   Okano,M., Chijiwa,T., Sasaki,H. and Li,E.
  TITLE     Direct Submission
  JOURNAL   Submitted (04-NOV-1999) CVRC, Mass. Gen. Hospital, 149 13th Street,
			Charlestown, MA 02129, USA
  REMARK    Sequence update by submitter
COMMENT     On Nov 18, 1999 this sequence version replaced gi:3327977.
FEATURES             Location/Qualifiers
	 source          1..200
					 /organism="Mus musculus"
					 /mol_type="mRNA"
					 /db_xref="taxon:10090"
					 /chromosome="12"
					 /map="4.0 cM"
	 gene            1..>200
					 /gene="Dnmt3a"
ORIGIN
		1 gaattccggc ctgctgccgg gccgcccgac ccgccgggcc acacggcaga gccgcctgaa
	   61 gcccagcgct gaggctgcac ttttccgagg gcttgacatc agggtctatg tttaagtctt
	  121 agctcttgct tacaaagacc acggcaattc cttctctgaa gccctcgcag ccccacagcg
	  181 ccctcgcagc cccagcctgc
//

A protein sequence:

LOCUS       DAD54807                  78 aa            linear   PLN 27-JUN-2024
DEFINITION  TPA_inf: hypothetical protein YJR107C-A [Saccharomyces cerevisiae
            S288C].
ACCESSION   DAD54807
VERSION     DAD54807.1
DBLINK      BioProject: PRJNA43747
DBSOURCE    accession BK006943.2
KEYWORDS    Third Party Data; TPA.
SOURCE      Saccharomyces cerevisiae S288C
  ORGANISM  Saccharomyces cerevisiae S288C
            Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;
            Saccharomycetes; Saccharomycetales; Saccharomycetaceae;
            Saccharomyces.
REFERENCE   1  (residues 1 to 78)
  AUTHORS   Galibert,F., Alexandraki,D., Baur,A., Boles,E., Chalwatzis,N.,
            Chuat,J.C., Coster,F., Cziepluch,C., De Haan,M., Domdey,H.,
            Durand,P., Entian,K.D., Gatius,M., Goffeau,A., Grivell,L.A.,
            Hennemann,A., Herbert,C.J., Heumann,K., Hilger,F., Hollenberg,C.P.,
            Huang,M.E., Jacq,C., Jauniaux,J.C., Katsoulou,C.,
            Karpfinger-Hartl,L. et al.
  TITLE     Complete nucleotide sequence of Saccharomyces cerevisiae chromosome
            X
  JOURNAL   EMBO J. 15 (9), 2031-2049 (1996)
   PUBMED   8641269
REFERENCE   2  (residues 1 to 78)
  AUTHORS   Goffeau,A., Barrell,B.G., Bussey,H., Davis,R.W., Dujon,B.,
            Feldmann,H., Galibert,F., Hoheisel,J.D., Jacq,C., Johnston,M.,
            Louis,E.J., Mewes,H.W., Murakami,Y., Philippsen,P., Tettelin,H. and
            Oliver,S.G.
  TITLE     Life with 6000 genes
  JOURNAL   Science 274 (5287), 546 (1996)
   PUBMED   8849441
REFERENCE   3  (residues 1 to 78)
  AUTHORS   Engel,S.R., Wong,E.D., Nash,R.S., Aleksander,S., Alexander,M.,
            Douglass,E., Karra,K., Miyasato,S.R., Simison,M., Skrzypek,M.S.,
            Weng,S. and Cherry,J.M.
  TITLE     New data and collaborations at the Saccharomyces Genome Database:
            updated reference genome, alleles, and the Alliance of Genome
            Resources
  JOURNAL   Genetics 220 (4) (2022)
   PUBMED   34897464
REFERENCE   4  (residues 1 to 78)
  CONSRTM   Saccharomyces Genome Database
  TITLE     Direct Submission
  JOURNAL   Submitted (14-DEC-2009) Department of Genetics, Stanford
            University, Stanford, CA 94305-5120, USA
REFERENCE   5  (residues 1 to 78)
  CONSRTM   Saccharomyces Genome Database
  TITLE     Direct Submission
  JOURNAL   Submitted (12-APR-2011) Department of Genetics, Stanford
            University, Stanford, CA 94305-5120, USA
  REMARK    Sequence update by submitter
REFERENCE   6  (residues 1 to 78)
  CONSRTM   Saccharomyces Genome Database
  TITLE     Direct Submission
  JOURNAL   Submitted (04-MAY-2012) Department of Genetics, Stanford
            University, Stanford, CA 94305-5120, USA
  REMARK    Protein update by submitter
COMMENT     Method: conceptual translation.
FEATURES             Location/Qualifiers
     source          1..78
                     /organism="Saccharomyces cerevisiae S288C"
                     /strain="S288C"
                     /db_xref="taxon:559292"
                     /chromosome="X"
     Protein         1..78
                     /product="hypothetical protein"
     Region          1..76
                     /region_name="DUF5137"
                     /note="Protein of unknown function (DUF5137); pfam17220"
                     /db_xref="CDD:375058"
     CDS             1..78
                     /locus_tag="YJR107C-A"
                     /coded_by="complement(BK006943.2:628457..628693)"
                     /experiment="EXISTENCE:mutant phenotype:GO:0071470
                     cellular response to osmotic stress [PMID:26554900]"
                     /note="hypothetical protein; encodes new type of domain,
                     which ab initio modeling suggests is predominantly
                     alpha-helical; nonessential for growth, deletion increases
                     sensitivity to osmostress; expressed at moderate to high
                     abundance; ORF also present in strains EC1118, YJM789,
                     RM11-1a, and AWRI1631; Gln24 in S288C reference is
                     substituted with Arg in wine strain EC1118; predicted
                     S-palmitoylation site on Cys2, suggesting membrane
                     association; transcript previously mischaracterized as
                     SUT646"
                     /db_xref="SGD:S000303810"
ORIGIN      
        1 mcddsydave eyyfnksvag isgqenwnkq latqvysrsl qpeilptlkp lscnkerana
       61 gkrvseeeqi ngkrkrkd
//