Journal Entry #2: Annotation - bcb420-2023/Helena_Jovic GitHub Wiki

Objective

  • Find a functional annotation dataset for human genes.
  • Gather information on the dataset, including its publication date, data type, update frequency, and available identifiers and formats.
  • Analyze how the dataset contributes to the functional annotation of genes, such as by providing information on gene ontology, pathway analysis, protein interactions, or other functional characteristics.

Time Management

Date Started: 2023-02-24
Data Completed: 2023-02-24
Estimated Time: 1.5 hours
Actual Time: 1.5 hours

Workflow and Results

1. What sort of data is it? What sort of information does it offer us?

DisGeNET is a comprehensive platform that integrates multiple types of data from various sources, including GWAS, animal models, expert-curated literature, and other databases. It offers information on gene-disease associations, including gene names, disease names, association types (e.g., genetic, functional), and evidence sources (e.g., literature, GWAS, animal models). Additionally, DisGeNET provides scores that indicate the level of evidence supporting each gene-disease association.

2. When and where was it published? Was it published?

The paper describing DisGeNET was published in the journal Nucleic Acids Research in 2015. Since then, DisGeNET has been regularly updated and maintained to include new data and improve the accuracy and completeness of existing information.

3. Is this annotation set updated regularly or is it a static source?

DisGeNET is updated regularly, with new data added to the platform as it becomes available. The platform is also regularly maintained to improve the accuracy and completeness of the data.

4. Where can I find this data? (link to the download web address or ftp site or publication where it can be found)

DisGeNET can be accessed through its website at https://www.disgenet.org/. The data can be downloaded in various formats, including CSV, TXT, and SQL, from the "Download" section of the website. In addition, the platform provides a RESTful API that enables programmatic access to the data.

5. How is the data formatted and released? Does it exist in some sort of standard file format?

The data in DisGeNET is formatted in a standardized manner using controlled vocabularies for gene and disease names, association types, and evidence sources. The data is released in several file formats, including CSV, TXT, and SQL, which are widely used and easily accessible. In addition, the platform provides a RESTful API that enables programmatic access to the data.

6. What identifiers are associated with these annotations?

DisGeNET uses several standard identifiers for genes and diseases, including Entrez Gene IDs, UniProt IDs, and OMIM IDs, among others. The platform also provides mappings between different identifier systems to facilitate data integration and analysis. This helps to ensure that data from different sources can be easily integrated and analyzed in a consistent manner.

Conclusion

DisGeNET aims to cover all disease areas, with special care on the integration and standardization of data and to provide open access to knowledge of genes associated with human diseases. An important aspect of the DisGeNET toolkit is to support different types of users. The database includes gene-disease associations mined from MEDLINE via a NLP-based approach, as the scientific literature represents a rich, up-to-date source of knowledge on disease genes. Overall, the DisGeNET toolkit aims to aid in the exploration and interpretation of data on the genetic determinants of disease for personalized medicine.

References

  1. Janet Piñero, Josep Saüch, Ferran Sanz, and Laura I. Furlong. 2021. The DisGeNET Cytoscape app: Exploring and Visualizing Disease Genomics Data. Computational and Structural Biotechnology Journal 19 (2021), 2960–2967. DOI:http://dx.doi.org/10.1016/j.csbj.2021.05.015
  2. Piñero J, Queralt-Rosinach N, Bravo À, et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015;2015:bav028. Published 2015 Apr 15. doi:10.1093/database/bav028.
    1. DisGeNET. https://www.disgenet.org/