Annotation sources - bcb420-2024/Wenzhu_Ye GitHub Wiki

Date and Time

  • Feb.27.2024
  • Estimated duration: 2 Hours
  • Actual duration: 1 Hour

Objective

Find an annotation data set and retrieve relevant information

Data Set

The data set I choose is Genotype-Tissue Expression (GTEx) Portal

Questions

What sort of data is it? What sort of information does it offer us?

The GTEx project collects and provides two main types of human data:

  • Genomic Data:
    • Genotype Data: Information about the genetic variants present in the DNA
    • DNA Sequencing Data: Sequencing information about the genomes to identify genetic variations.
  • Transcriptomic Data:
    • Gene Expression Data: Information about how genes are expressed in different tissues by the level of messenger RNA (mRNA) produced by genes.
    • RNA Sequencing Data: The project utilizes RNA sequencing to quantify and analyze gene expression across various tissues.

When and where was it published? Was it published?

It was published in September 2010 as a two-year pilot project, launched by the National Institutes of Health (NIH).

Is this annotation set updated regularly or is it a static source?

Yes, this annotation set is updated regularly with the latest release of V8 in 2020, every time with more samples and data.

Where can I find this data? (link to the download web address or ftp site or publication where it can be found)

The data could be found through its website GTEx portal. The data is also available for download and access through API.

How is the data formatted and released? Does it exist in some sort of standard file format?

The data is typically provided and released in the following formats: Variant Call Format (VCF), Binary Alignment Map (BAM), Sequence Alignment Map (SAM), Tab-Delimited Text Files, and Metadata Files.

What identifiers are associated with these annotations?

The identifiers associated with these annotations include Ensembl IDs, gene symbols, dbSNP IDs, Sample IDs, Tissue Codes or Names, and Ensembl transcript IDs.

Reference

GTEx Consortium. (2020). The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 52(9), 1–9. https://doi.org/10.1038/s41588-020-0669-3