Journal Entry 3: Homework Assignment : Annotation sources - bcb420-2022/Sabbir_Hossain GitHub Wiki
Give the gist and answer questions about The Consensus CDS protein set database CCDS, an annotation database/source provider.
Time est.: 30 mins
Time used: 0.5 h
Date started: 2022/04/21
Date completed: 2022/04/21
Find an annotation data set (excluding GO and Reactome which I have outlined below as an example) for human genes - any data set that adds functional, process, location, disease status ... to a set of genes.
Find out the following information:
The Consensus CDS (CCDS) project is a collaborative effort to find a core group of consistently annotated and high-quality human and mouse protein coding regions. The long-term objective is to encourage the adoption of a common set of gene annotations.
2019 for mice. 2018 for humans. All releases
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Genome Res. 2009 Jul;19(7):1316-23. PubMed: PMID: 19498102
Tracking and coordinating an international curation effort for the CCDS Project. Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J, Pruitt KD. Database 2012 Mar 20;2012:bas008. doi: 10.1093/database/bas008. PubMed: PMID: 22434842
Current status and new features of the Consensus Coding Sequence database. Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Nucleic Acids Res. 2014 Jan 1;42(1):D865-72. doi: 10.1093/nar/gkt1059. PubMed: PMID: 24217909
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, Girón CG, Diekhans M, Barnes I, Bennett R, Berry AE, Cox E, Davidson C, Goldfarb T, Gonzalez JM, Hunt T, Jackson J, Joardar V, Kay MP, Kodali VK, Martin FJ, McAndrews M, McGarvey KM, Murphy M, Rajput B, Rangwala SH, Riddick LD, Seal RL, Suner MM, Webb D, Zhu S, Aken BL, Bruford EA, Bult CJ, Frankish A, Murphy T, Pruitt KD. Nucleic Acids Res. 2018 Jan 4;46(D1):D221-D228. doi: 10.1093/nar/gkx1031. PubMed: PMID: 29126148 PubMed Central: PMCID: PMC5753299 CcdsB
Static it seems.
Where can I find this data? (link to the download web address or ftp site or publication where it can be found)
Direct link to all releases ftp for both mice and human.
As ftp archive releases. The CCDS collection contains full-length (with a starting ATG and valid stop-codon) coding sequences that can be translated from the genome without frameshifts. The Havana team at EMBL-EBI and the RefSeq annotation group at NCBI are the two primary curation groups.
The following is the general process flow for defining the CCDS gene set:
- Compare the outcomes of genomic annotation.
- On the genome quality evaluation, look for annotated coding sections with the same geographical coordinates.
- Lower-quality CDSs should be removed from the core set pending further assessment by the collaborating groups.
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Genome Res. 2009 Jul;19(7):1316-23. PubMed: PMID: 19498102
Tracking and coordinating an international curation effort for the CCDS Project. Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J, Pruitt KD. Database 2012 Mar 20;2012:bas008. doi: 10.1093/database/bas008. PubMed: PMID: 22434842
Current status and new features of the Consensus Coding Sequence database. Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Nucleic Acids Res. 2014 Jan 1;42(1):D865-72. doi: 10.1093/nar/gkt1059. PubMed: PMID: 24217909
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, Girón CG, Diekhans M, Barnes I, Bennett R, Berry AE, Cox E, Davidson C, Goldfarb T, Gonzalez JM, Hunt T, Jackson J, Joardar V, Kay MP, Kodali VK, Martin FJ, McAndrews M, McGarvey KM, Murphy M, Rajput B, Rangwala SH, Riddick LD, Seal RL, Suner MM, Webb D, Zhu S, Aken BL, Bruford EA, Bult CJ, Frankish A, Murphy T, Pruitt KD. Nucleic Acids Res. 2018 Jan 4;46(D1):D221-D228. doi: 10.1093/nar/gkx1031. PubMed: PMID: 29126148 PubMed Central: PMCID: PMC5753299 CcdsB