Journal Entry ‐ Annotation Data Source - bcb420-2024/Anna_Lai GitHub Wiki
Data Annotation Resources for Bioinformaticians
Date: Feb 25 Source investigating: The Cancer Genome Atlas (TCGA) dataset
Questions
- What sort of data is it? What sort of information does it offer us?
It is a cancer genomics atlas that contains human genomic information related to cancer. This dataset was funded by the US Government and supervised by the National Cancer Institute's Center and the National Human Genome Research Institute.
Outcomes & Impact of The Cancer Genome Atlas
- When and where was it published? Was it published?
Yes, the project was initially published in the Nature Journal in 2008.
- Is this annotation set updated regularly or is it a static source?
This annotation set is updated regularly to keep up with research findings. It was last updated on June 28, 2021.
- Where can I find this data? (link to the download web address or ftp site or publication where it can be found)
Researchers and scientists can go to the website, query the gene set that they are interested in with keywords, and download the data they want. Here is the website: Commons Data Portal
- How is the data formatted and released? Does it exist in some sort of standard file format?
The data is released on their webpage. A wide range of datatype is supported.
- What identifiers are associated with these annotations?
The most commonly use identifiers are used in this annotation source. The supported annotation systems are NCBI's Entrez Gene, RefSeq, dbSNP, Ensembl, UniProtKB, etc.