Journal Entry: Annotation sources - bcb420-2022/Emiliya_Stolyarova GitHub Wiki

MetaCyc

1. What sort of data is it? What sort of information does it offer us?

MetaCyc is a database which provides information on metabolic pathways. MetaCyc provides metabolic information on many organisms including humans (Caspi et al., 2014). Information on the reactions performed by an enzyme and their associated metabolic pathway along with a pathway diagram can be found in MetaCyc (Caspi et al. 2020).

2. When and where was it published? Was it published?

The database was published originally published in the year 2000 in the journal Nucleic Acids Research along with the EcoCyc database (Karp et al., 2000). This publication was followed by a publication titled The MetaCyc Database in 2002 in the journal Nucleic Acids Research (Karp et al., 2002). The most recent publication was in 2020 (Caspi et al. 2020).

3. Is this annotation set updated regularly or is it a static source?

MetaCyc does get updated. The MetaCyc website provides a release history which indicates that the current latest version is version 25.5 which was released in December of 2021.

4. Where can I find this data? (link to the download web address or ftp site or publication where it can be found)

The data can be accessed on the MetaCyc webpage. A license must be obtained in order to download the data files. Instructions on how to obtain a license can be found here.

5. How is the data formatted and released? Does it exist in some sort of standard file format?

The MetaCyc data is stored in a combination of several different file formats. Notably, the protein sequences and DNA sequences are stored as FASTA files, and pathway information is stored in BioPAX files in an OWL format. Other data is stored in a tabular format with a COL file extension or in a attribute-value format with a DAT file extension. A further description of all the file formats is provided here.

6. What identifiers are associated with these annotations?

MetaCyc provides several different identifiers on the entries on their website. Along with a MetaCyc and a UniProt ID, other identifiers are provided such as Ensembl and RefSeq IDs.

References:

Caspi, R., Billington, R., Keseler, I. M., Kothari, A., Krummenacker, M., Midford, P. E., Ong, W. K., Paley, S., Subhraveti, P., & Karp, P. D. (2020). The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic acids research, 48(D1), D445–D453. https://doi.org/10.1093/nar/gkz862

Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C. A., Holland, T. A., Keseler, I. M., Kothari, A., Kubo, A., Krummenacker, M., Latendresse, M., Mueller, L. A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D. S., Weerasinghe, D., Zhang, P., & Karp, P. D. (2014). The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic acids research, 42(Database issue), D459–D471. https://doi.org/10.1093/nar/gkt1103

Karp, P. D., Riley, M., Paley, S. M., & Pellegrini-Toole, A. (2002). The MetaCyc Database. Nucleic acids research, 30(1), 59–61. https://doi.org/10.1093/nar/30.1.59

Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M., & Pellegrini-Toole, A. (2000). The EcoCyc and MetaCyc databases. Nucleic acids research, 28(1), 56–59. https://doi.org/10.1093/nar/28.1.56