interpretation - ToolForVol/doc-synMall GitHub Wiki
SynMall is a one-stop synonymous mutation database that stores synonymous mutations across the entire human genome. It contains over 97 million synonymous mutations, corresponding to 25 million unique genome coordinates and reference replacement bases
Field Name | Description |
---|---|
Variant38 | sSNV on GRCh38 format as {chromosome}_{position}_{reference allele}/{alternate allele} |
Chromosome | Chromosome of sSNV |
Position38 | Position coordinate of sSNV, build on GRCh38 |
Reference Allele | The refernece allele on genome |
Alternate Allele | The alternate allele of sSNV |
Position19 | Position coordinate of sSNV, build on GRCh37, lifted with LIFTOVER(For the unmapped records we use - to represent) |
Source | Source of this sSNV comes from. G=Generated with protein coding transcripts; S=synVep; F=FavorAnnotator; C=CADDv1.7 |
Variant38 | sSNV on GRCh37 format as {chromosome}_{position}_{reference allele}/{alternate allele}(For the unmapped records we use - to represent) |
ID | dbSNP rsID build on b156 |
This section compiles pathogenicity prediction scores for mutations, measured using computational tools that are not limited to a specific type of mutation. The table below lists the names of these tools and the meanings of their fields.
Field Name | Description | Refernece |
---|---|---|
CADD_RawScore | Raw score from the model , represents a variant is likely to be "observed" vs "simulated". >0: observed <0: simulated |
Rentzsch P, Witten D, Cooper G M, et al. CADD: predicting the deleteriousness of variants throughout the human genome[J]. Nucleic Acids Research, 2019, 47(D1): D886-D894. |
CADD_PHRED | CADD PHRED Score that scaled on ~8.6 billion SNVs. Range: [0, 1] |
Same as above |
DANN_score | DANN is a functional prediction score retrained based on the training data of CADD using deep neural network. Scores range from 0 to 1. A larger number indicate a higher probability to be damaging. | Quang D, Chen Y, Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants[J]. Bioinformatics, 2015, 31(5): 761-763. |
Eigen_Score | A functional prediction score based on conservation, allele frequencies, and deleteriousness prediction | Ionita-Laza I, McCallum K, Xu B, et al. A spectral approach integrating functional genomic annotations for coding and noncoding variants[J]. Nature genetics, 2016, 48(2): 214-220. |
FATHMM-MKL_Score | Discriminate between pathogenic variants and benign variants. >0.5: deleterious <=0.5: neutral or benign |
Shihab H A, Rogers M F, Gough J, et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation[J]. Bioinformatics, 2015, 31(10): 1536-1543. |
FATHMM-XF_Score | Discriminate between pathogenic variants and benign variants. >0.5: deleterious <=0.5: neutral or benign |
Rogers M F, Shihab H A, Mort M, al. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features[J]. Bioinformatics, 2018, 34(3): 511-513. |
CAPICE_Score | The higher the score, the more likely that the variant is pathogenic. | Li S, van der Velde K J, De Ridder D, et al. CAPICE: a computational method for consequence-agnostic pathogenicity interpretation of clinical exome variations[J]. Genome Medicine, 2020, 12: 1-11. |
TraP_Score | The chance of a variant being pathogenic, the higher the score the higher the damage the variant is predicted to have. 0.459<0.93: possibly damaging >=0.93: probably damaging |
Gelfman S, Wang Q, McSweeney K M, al. Annotating pathogenic non-coding variants in genic regions[J]. Nat Commun, 2017, 8(1): 236. |
PhD-SNPg_Score | A binary classifier for predicting pathogenic variants. ->1: Pathogenic ->0: Benign |
Capriotti E, Fariselli P. PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants[J]. Nucleic acids research, 2017, 45(W1): W247-W252. |
GPN-MSA_Score | Refers to the deleteriousness of one position. cutoff: -7 |
Benegas G, Albors C, Aw A J, et al. GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction[J]. bioRxiv, 2023. |
CScape-somatic_Score | Discriminate between pathogenic variants and benign variants. >0.5: deleterious <=0.5: neutral or benign |
Rogers M F, Gaunt T R, Campbell C. CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome[J]. Bioinformatics, 2020, 36(12): 3637-3644. |
CScape_Score | Discriminate between pathogenic variants and benign variants. >0.5: deleterious <=0.5: neutral or benign |
Rogers M F, Shihab H A, Gaunt T R, al. CScape: a tool for predicting oncogenic single-point mutations in the cancer genome[J]. Sci Rep, 2017, 7(1): 11597. |
This section compiles pathogenicity prediction scores specifically designed for synonymous mutations, measured using computational tools. The table below lists the field names of these tools, their meanings, and the corresponding references.
Field Name | Description | Refernece |
---|---|---|
EnDSM_Score | Detect deleterious sSNV based on a ensemble learning framework. | Cheng N, Wang H, Tang X, al. An Ensemble Framework for Improving the Prediction of Deleterious Synonymous Mutation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5): 2603-2611. |
frDSM_Score | Deleterious synonymous mutation prediction using logistic regression. | Wang H, Sun J, Liu M, al. frDSM: An Ensemble Predictor With Effective Feature Representation for Deleterious Synonymous Mutation in Human Genome[J]. IEEE/ACM Trans Comput Biol Bioinform, 2023, 20(1): 371-377. |
PrDSM_Score | Predictive score of PrDSM for each synonymous mutation. Range: [0, 1] >0.308: deleterious <=0.308: benign |
Cheng N, Li M, Zhao L, al. Comparison and integration of computational methods for deleterious synonymous mutation prediction[J]. Brief Bioinform, 2020, 21(3): 970-981. |
usDSM_Score | A prediction score for deleterious synonymous mutations. The larger the score is, the more likely the mutation is deleterious. | Tang X, Zhang T, Cheng N, 等. usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme[J]. Brief Bioinform, 2021, 22(5). |
usDSM_Class | The prediction of usDSM model, deleterious or benign. | Same as above. |
Syntool_Score | Intolerance score to sn variation | Zhang T, Wu Y, Lan Z, al. Syntool: A Novel Region-Based Intolerance Score to Single Nucleotide Substitution for Synonymous Mutations Predictions Based on 123,136 Individuals[J]. Biomed Res Int, 2017, 2017: 5096208. |
Syntool_Score_P | Intolerance score percentile to sn variation | Same as above. |
SliVA_Score | A tool for the automated harmfulness prediction of synonymous (silent) mutations within the human genome. Range: [0, 1] ->1: Harmful |
Buske O J, Manickaraj A, Mital S, al. Identification of deleterious synonymous variants in human genomes[J]. Bioinformatics, 2013, 29(15): 1843-1850. |
synVep_Score | Evaluating the effects of human synonymous variants based on different transcription. | Zeng Z, Aptekmann A A, Bromberg Y. Decoding the effects of synonymous variants[J]. Nucleic acids research, 2021, 49(22): 12673-12691. |
This section compiles information on whether mutations have regulatory or functional effects based on computational tools, rather than necessarily being pathogenic. Many tools are designed for non-coding mutations/regions, but they also provide precomputed scores for the whole genome or regions including some synonymous mutations, making them applicable for annotating synonymous mutations. The table below lists the field names of these tools, their meanings, and the corresponding references.
Field Name | Description | Refernece |
---|---|---|
MACIE01 | The estimated joint posterior probabilities of not evolutionarily conserved and regulatory functional | Li X, Yung G, Zhou H, et al. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome[J]. The American Journal of Human Genetics, 2022, 109(3): 446-456. |
MACIE10 | The estimated joint posterior probabilities of evolutionarily conserved and not regulatory functional | Same as above |
MACIE00 | The estimated joint posterior probabilities of not evolutionarily conserved and not regulatory functional | Same as above |
MACIE11 | The estimated joint posterior probabilities of both evolutionarily conserved and regulatory functional | Same as above |
MACIE_conserved | The estimated posterior probability of evolutionarily conserved | Same as above |
MACIE_regulatory | The estimated posterior probability of regulatory functional | Same as above |
MACIE_anyclass | The estimated posterior probability of evolutionarily conserved or regulatory functional | Same as above |
FunSeq_Score | A flexible framework to prioritize regulatory mutations from cancer genome sequencing (integrative score). | Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013) |
GenoCanyon_Score | Predict the functional potential at each nucleotide. | Lu, Q., Hu, Y., Sun, J. et al. A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data. Sci Rep 5, 10576 (2015). |
FIRE_Score | A score refers to the variant's potential to regulate the expression levels of nearby genes. | Ioannidis N M, Davis J R, DeGorter M K, et al. FIRE: functional inference of genetic variants that regulate gene expression[J]. Bioinformatics, 2017, 33(24): 3895-3901. |
CDTS_Score | CDTS context-dependent tolerance scorescore. The lower the score is, the more intolerant to variation. | di Iulio, J. et al. The human noncoding genome defined by genetic diversity. Nat. Genet. 50, 333– 337 (2018) |
CDTS_percentile | genome-wide percentile of the CDTS_score. The lower the percentile,the more constrained the region is. | Same as above |
ReMM_Score | Scores the positions in the human genome in terms of their regulatory probability. ->0: non-deleterious; ->1: deleterious |
Smedley D, Schubach M, Jacobsen J O B, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease[J]. The American Journal of Human Genetics, 2016, 99(3): 595-606. |
ALoFT_Score | ALoFT provides extensive annotations to putative loss-of-function variants (LoF) in protein-coding genes including functional, evolutionary and network features (integrative score). | Balasubramanian S, Fu Y, Pawashe M, et al. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes[J]. Nature communications, 2017, 8(1): 1-11. |
ALoFT_Description | ALoFT annotation can predict the impact of premature stop variants and classify them as dominant disease-causing, recessive disease-causing and benign variants (integrative score). | Same as above |
LINSIGHT_Score | The LINSIGHT score (integrative score). A higher LINSIGHT score indicates more functionality. Range: [0.215, 0.995]. | Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017) |
RegSeq0 | Regulatory sequence model HEK293T | Schubach M, Maass T, Nazaretyan L, et al. CADD v1. 7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions[J]. Nucleic Acids Research, 2024, 52(D1): D1143-D1154. |
RegSeq1 | Regulatory sequence model K562 | Same as above |
RegSeq2 | Regulatory sequence model HepG2 | Same as above |
RegSeq3 | Regulatory sequence model HeLa-S3 | Same as above |
RegSeq4 | Regulatory sequence model MC-7 | Same as above |
RegSeq5 | Regulatory sequence model iPS DF 19.11 | Same as above |
RegSeq6 | Regulatory sequence model GM23338 | Same as above |
RegSeq7 | Regulatory sequence model GC-matched background | Same as above |
SpliceAI-acc-gain | Masked SpliceAI acceptor gain score (default: 0*) | Jaganathan, K. et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535- 548.e24 (2019). |
SpliceAI-acc-loss | Masked SpliceAI acceptor loss score (default: 0) | Same as above |
SpliceAI-don-gain | Masked SpliceAI donor gain score (default: 0) | Same as above |
SpliceAI-don-loss | Masked SpliceAI donor loss score (default: 0) | Same as above |
MMSp_acceptor | MMSplice acceptor score (default: 0) | Cheng J, Nguyen T Y D, Cygan K J, et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing[J]. Genome biology, 2019, 20: 1-15. |
MMSp_exon | MMSplice exon score (default: 0) | Same as above |
MMSp_donor | MMSplice donor score (default: 0) | Same as above |
dbscSNV-ada_Score | Adaboost classifier score from dbscSNV (default: 0*) | Jian X, Boerwinkle E, Liu X. In silico prediction of splice-altering single nucleotide variants in the human genome[J]. Nucleic acids research, 2014, 42(22): 13534-13544. |
dbscSNV-rf_Score | Random forest classifier score from dbscSNV (default: 0*) | Same as above |
TargetScan_Score | Targetscan (default: 0*) | Friedman, R. C., Farh, K. K.-H., Burge, C. B. & Bartel, D. P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105 (2009). |
mirSVR-Score | mirSVR-Score (default: 0*) | Betel D, Koppal A, Agius P, et al. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites[J]. Genome biology, 2010, 11: 1-14. |
mirSVR-E | mirSVR-E (default: 0) | Same as above |
mirSVR-Aln | mirSVR-Aln (default: 0) | Same as above |
- Reference: Stenson P D, Mort M, Ball E V, et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting[J]. Human genetics, 2020, 139: 1197-1207.
- Retrieve Source: Professional, 2023.3
- Brief Introduction: The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease.
Field Name | Description |
---|---|
acc_num | The HGMD accession number for this mutation. Typically these are strings consisting of CM, CP, CX or HM etc, followed by a six digit integer, such as CM995289. Foreign key for the GENOMIC_COORDS and MUTNOMEN tables. |
chrom_hg19 | |
pos_hg19 | |
ref_hg19 | |
alt_hg19 | |
class_hg19 | |
mut_hg19 | |
chrom_hg38 | |
pos_hg38 | |
ref_hg38 | |
alt_hg38 | |
class_hg38 | |
mut_hg38 | |
diseasegene | |
chrom | If known, the number of the chromosome (including X and Y). DEPRECIATED |
genename | A human readable, fully spelled out name for the gene. |
gdbid | Identifier for the GDB Genome Database. When a matching record has not been identified, the field contains NULL. Present for historical reasons, as GDB no longer exists. |
omimid | Identifier for the OMIM database, http://www.ncbi.nlm.nih.gov/omim. When a matching record has not been identified, the field contains NULL. |
amino | The amino acid change caused by the mutation, in triple-letter code. |
deletion | Deletions are presented in terms of the deleted bases in lower case plus, in upper case, 10 bp DNA sequence flanking both sides of the lesion. Intron/exon boundary information may be provided where identified (e.g. I12E13). The codon number in the CODON field represents the last whole codon preceding the deletion and is marked in the given sequence by the caret character (^). |
insertion | Insertions are presented in terms of the inserted bases in lower case plus, in upper case, 10 bp DNA sequence flanking both sides of the lesion. The numbered codon from the AMINO field is preceded in the given sequence by the caret character (^). |
codon | The number of the altered codon mapped to the HGMD cDNA sequence provided. |
codonAff | The codon affected by the mutation in question. |
descr | A textual description of the mutation. |
refseq | The NCBI mRNA reference sequence utilised by HGMD. |
hgvs | Composite HGVS cDNA based nomenclature for the mutation. |
hgvsAll | Composite HGVS nomenclature for fulltext indexing and searching purposes. |
dbsnp | Links the variants in HGMD to a corresponding dbSNP entry. |
chromosome | Strictly a number from 1-22, X or Y. |
startCoord | Number of the first nucleotide of the mutation (chromosomal coordinate). For deletions, the first deleted nucleotide, for insertions, the last nucleotide before the inserted sequence, for single nucleotide mutations, the number of the mutated nucleotide. |
endCoord | Number of the last nucleotide of the mutation (chromosomal coordinate). For deletions, the last deleted nucleotide; for insertions, the first nucleotide after the inserted sequence; for single nucleotide mutations, the number of the mutated nucleotide (should be identical to CoordSTART). |
expected_inheritance | Inheritance data curated from multiple literature sources (only where such data may be unequivocally assigned). |
gnomad_AC | Allele counts for HGMD variants exactly matching variants found in the Genome Aggregation Database |
gnomad_AF | Allele frequency from gnomAD. |
gnomad_AN | Total number of alleles sequenced by gnomAD at the matching locus. |
tag | This field categorizes mutations and polymorphisms. There are seven possible values, DM, DM?, DP, DFP, FP, FTV and R. |
dmsupport | Positive or negative score depending on the support (or lack of support) of the extra references for pathogenicity or functional alteration. Experimental. |
rankscore | Ranking score is a single score between 0-1, with 1 been most likely diseasecausing. The score is computed using machine learning, and is based upon multiple lines of evidence, including HGMD literature support for pathogenicity, evolutionary conservation (100- way vertebrate alignment), variant allele frequency and in-silico prediction. This feature is under ongoing development. |
mutype | Primary type of mutation logged in HGMD. (i.e. missense, initiation, nonsense, synonymous, noncoding, frameshift, inframe, gross, canonical-splice, exonic-splice, splice, nonstop, regulatory). |
author | Reference field. All the reference fields refer to the literature report that the corresponding mutation was obtained from. Last name of the first author |
title | |
fullname | Reference field. The approved Medline abbreviation for the journal. Foreign key for the base table JOURNAL.FULLNAME field |
allname | ALLNAME contains the name spelled out in its entirety. |
vol | Reference field. There are 6 possible values for this field. |
page | Reference field. Number of the first page of the article. |
year | Reference field. Year the article was published, in four digits. |
pmid | Reference field. There are 5 possible values, numeric, HGOL, LSDB, NO ID and ABST. |
pmidAll | This field contains all of the PubMed Ids from primary and additional references that are associated with that variant. |
reftag | The REFTAG field contains five values APR for additional phenotype report, FCR for functional characterisation report, MCR for molecular characterisation report, ACR for additional case report (detailing an additional case of the mutation) and SAR for simple additional report. |
comments | Free text comments by the curator. |
new_date | The date when the mutation was added to the database. |
base | This field is specific to single base pair substitutions and contains the description of the nucleotide change. This is presented in terms of a triplet change. For example, TAC-TAT represents a change of the last nucleotide C in the triplet to a T. TGT-TAT represents a change of the middle nucleotide G to an A. |
clinvarID | |
clinvar_clnsig |
- Reference: Landrum M J, Chitipiralla S, Brown G R, et al. ClinVar: improvements to accessing data[J]. Nucleic acids research, 2020, 48(D1): D835-D844.
- Retrieve Source: https://www.ncbi.nlm.nih.gov/clinvar/ , 2024-06-11
- Brief Introduction: ClinVar is a freely accessible public archive maintained by the NIH, aggregates and provides interpretations of human genetic variants' relationships to diseases.
Field Name | Description |
---|---|
AF_ESP | allele frequencies from GO-ESP |
AF_EXAC | allele frequencies from ExAC |
AF_TGP | allele frequencies from TGP |
ALLELEID | the ClinVar Allele ID |
CLNDN | ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB |
CLNDNINCL | For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB |
CLNDISDB | Tag-value pairs of disease database name and identifier submitted for germline classifications, e.g. OMIM:NNNNNN |
CLNDISDBINCL | For included Variant: Tag-value pairs of disease database name and identifier for germline classifications, e.g. OMIM:NNNNNN |
CLNHGVS | Top-level (primary assembly, alt, or patch) HGVS expression |
CLNREVSTAT | ClinVar review status of germline classification for the Variation ID |
CLNSIG | Aggregate germline classification for this single variant; multiple values are separated by a vertical bar |
CLNSIGCONF | Conflicting germline classification for this single variant; multiple values are separated by a vertical bar |
CLNSIGINCL | Germline classification for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:classification; multiple values are separated by a vertical bar |
CLNVC | Variant type |
CLNVCSO | Sequence Ontology id for variant type |
CLNVI | the variant's clinical sources reported as tag-value pairs of database and variant identifier |
DBVARID | nsv accessions from dbVar for the variant |
GENEINFO | Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|) |
MC | comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence |
ONCDN | ClinVar's preferred disease name for the concept specified by disease identifiers in ONCDISDB |
ONCDNINCL | For included variant: ClinVar's preferred disease name for the concept specified by disease identifiers in ONCDISDBINCL |
ONCDISDB | Tag-value pairs of disease database name and identifier submitted for oncogenicity classifications, e.g. MedGen:NNNNNN |
ONCDISDBINCL | For included variant: Tag-value pairs of disease database name and identifier for oncogenicity classifications, e.g. OMIM:NNNNNN |
ONC | Aggregate oncogenicity classification for this single variant; multiple values are separated by a vertical bar |
ONCINCL | Oncogenicity classification for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:classification; multiple values are separated by a vertical bar |
ONCREVSTAT | ClinVar review status of oncogenicity classification for the Variation ID |
ONCCONF | Conflicting oncogenicity classification for this single variant; multiple values are separated by a vertical bar |
ORIGIN | Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other RS dbSNP ID (i.e. rs number) |
SCIDN | ClinVar's preferred disease name for the concept specified by disease identifiers in SCIDISDB |
SCIDNINCL | For included variant: ClinVar's preferred disease name for the concept specified by disease identifiers in SCIDISDBINCL |
SCIDISDB | Tag-value pairs of disease database name and identifier submitted for somatic clinial impact classifications, e.g. MedGen:NNNNNN |
SCIDISDBINCL | For included variant: Tag-value pairs of disease database name and identifier for somatic clinical impact classifications, e.g. OMIM:NNNNNN |
SCIREVSTAT | ClinVar review status of somatic clinical impact for the Variation ID |
SCI | Aggregate somatic clinical impact for this single variant; multiple values are separated by a vertical bar |
SCIINCL | Somatic clinical impact classification for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:classification; multiple values are separated by a vertical bar |
- Reference: Tate J G, Bamford S, Jubb H C, et al. COSMIC: the catalogue of somatic mutations in cancer[J]. Nucleic acids research, 2019, 47(D1): D941-D947.
- Retrieve Source: https://cancer.sanger.ac.uk/cosmic/ , v100
- Brief Introduction: COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
Field Name | Description |
---|---|
COSMIC_MUTATION_ID | Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. |
GENE | Gene name |
TRANSCRIPT | Transcript accession |
STRAND | Gene strand |
LEGACY_ID | Legacy Mutation ID |
CDS | CDS annotation |
AA | Peptide annotation |
HGVSC | HGVS cds syntax |
HGVSP | HGVS peptide syntax |
HGVSG | HGVS genomic syntax |
SAMPLE_COUNT | How many genome screens samples have this mutation |
IS_CANONICAL | The Ensembl Canonical transcript is a single, representative transcript identified at every locus |
TIER | Indicates to which tier of the Cancer Gene Census the gene belongs (1/2) |
SO_TERM | SO term for this mutation |
COMISC_SOURCE | This record comes from TARGETED_SCREEN or GENOME_SCREEN. GENOME_SCREEN: Coding point mutations from genome wide screens (including whole exome sequencing) from the current release; TARGETED_SCREEN: Complete curated COSMIC dataset (targeted screens) from the current release. |
- Reference: Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource[J]. Nucleic acids research, 2023, 51(D1): D977-D985.
- Retrieve Source: https://www.ebi.ac.uk/gwas/home , v1.0.2
- Brief Introduction: The NHGRI-EBI GWAS Catalog is a FAIR knowledgebase providing standardized GWAS data, containing variant-trait associations and metadata for over 45,000 published GWAS, with expanded data types and improved interoperability, curated from publications or prepublication author submissions.
FieldName | Description |
---|---|
DATE ADDED TO CATALOG | Date a study is published in the catalog |
PUBMEDID | PubMed identification number |
FIRST AUTHOR | Last name and initials of first author |
DATE | Publication date (online (epub) date if available) |
JOURNAL | Abbreviated journal name |
LINK | PubMed URL |
STUDY | Title of paper |
DISEASE/TRAIT | Disease or trait examined in study |
INITIAL SAMPLE SIZE | Sample size and ancestry description for stage 1 of GWAS (summing across multiple Stage 1 populations, if applicable) |
REPLICATION SAMPLE SIZE | Sample size and ancestry description for subsequent replication(s) (summing across multiple populations, if applicable) |
REGION | Cytogenetic region associated with rs number |
CHR_ID | Chromosome number associated with rs number |
CHR_POS | Chromosomal position associated with rs number |
REPORTED GENE(S) | Gene(s) reported by author |
MAPPED_GENE | Gene(s) mapped to the strongest SNP. If the SNP is located within a gene, that gene is listed. If the SNP is located within multiple genes, these genes are listed separated by commas. If the SNP is intergenic, the upstream and downstream genes are listed, separated by a hyphen. |
UPSTREAM_GENE_ID | Entrez Gene ID for nearest upstream gene to rs number, if not within gene |
DOWNSTREAM_GENE_ID | Entrez Gene ID for nearest downstream gene to rs number, if not within gene |
SNP_GENE_IDS | Entrez Gene ID, if rs number within gene; multiple genes denotes overlapping transcripts |
UPSTREAM_GENE_DISTANCE | Distance in kb for nearest upstream gene to rs number, if not within gene |
DOWNSTREAM_GENE_DISTANCE | Distance in kb for nearest downstream gene to rs number, if not within gene |
STRONGEST SNP-RISK ALLELE | SNP(s) most strongly associated with trait + risk allele (? for unknown risk allele). May also refer to a haplotype |
SNPS | Strongest SNP; if a haplotype it may include more than one rs number (multiple SNPs comprising the haplotype) |
MERGED | Denotes whether the SNP has been merged into a subsequent rs record (0 = no; 1 = yes;) |
SNP_ID_CURRENT | Current rs number (will differ from strongest SNP when merged = 1) |
CONTEXT | Provides information on a variant’s predicted most severe functional effect from Ensembl |
INTERGENIC | Denotes whether SNP is in intergenic region (0 = no; 1 = yes) |
RISK ALLELE FREQUENCY | Reported risk/effect allele frequency associated with strongest SNP in controls (if not available among all controls, among the control group with the largest sample size). If the associated locus is a haplotype the haplotype frequency will be extracted. |
P-VALUE | Reported p-value for strongest SNP risk allele (linked to dbGaP Association Browser). Note that p-values are rounded to 1 significant digit (for example, a published p-value of 4.8 x 10-7 is rounded to 5 x 10-7). |
PVALUE_MLOG | -log(p-value) |
P-VALUE (TEXT) | Information describing context of p-value (e.g. females, smokers). |
OR or BETA | Reported odds ratio or beta-coefficient associated with strongest SNP risk allele. Note that prior to 2021, any OR <1 was inverted, along with the reported allele, so that all ORs included in the Catalog were >1. This is no longer done, meaning that associations added after 2021 may have OR <1. Appropriate unit and increase/decrease are included for beta coefficients. |
95% CI (TEXT) | Reported 95% confidence interval associated with strongest SNP risk allele, along with unit in the case of beta-coefficients. If 95% CIs are not published, we estimate these using the standard error, where available. |
PLATFORM [SNPS PASSING QC] | Genotyping platform manufacturer used in Stage 1; also includes notation of pooled DNA study design or imputation of SNPs, where applicable |
CNV | Study of copy number variation (yes/no) |
MAPPED_TRAIT | Mapped Experimental Factor Ontology trait for this study |
MAPPED_TRAIT_URI | URI of the EFO trait |
STUDY ACCESSION | Accession ID allocated to a GWAS Catalog study |
GENOTYPING TECHNOLOGY | Genotyping technology/ies used in this study, with additional array information (ex. Immunochip or Exome array) in brackets. |
- Reference: Leslie R, O' Donnell C J, Johnson A D. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database[J]. Bioinformatics, 2014, 30(12): i185-i194.
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa , v2
- Brief Introduction: GRASP contains over 6.2 million SNP-phenotype associations from 1390 GWAS studies, re-annotated with 16 diverse sources including RNA editing sites, lincRNAs, and PTMs.
Field Name | Description |
---|---|
rs | Latest snp ID from dbSNP, it can be different from the original SNP entry in the database due to SNPmerges (merged = 1) |
PMID | PubMed identifier for paper from which the SNP association originates |
p-value | P-value for SNP-phenotype association |
phenotype | Phenotype description of SNP-phenotype entry |
ancestry | Ethnodemographic description of the paper population(s) (e.g., European, Mixed) |
platform | Description of genotyping and/or imputation platform(s) and number of SNP markers (specified or approximated) included in post-QC analyses |
- Reference: Piñero J, Ramírez-Anguita J M, Saüch-Pitarch J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update[J]. Nucleic acids research, 2020, 48(D1): D845-D855.
- Retrieve Source: https://www.disgenet.org/home/ , 2020.3
- Brief Introduction: DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases.
Field Name | Description |
---|---|
snpId | dbSNP variant Identifier |
class | type of variant |
chromosome | Chromosome of the variant |
position | Position in chromosome |
DSI | The Disease Specificity Index for the variant |
DPI | The Disease Pleiotropy Index for the variant |
NofDiseases | Number of diseases associated to the variant |
NofPmids | Total number of publications reporting the Variant-Disease association |
ClinGen网页上没有对字段的详细描述
- Reference: Rehm H L, Berg J S, Brooks L D, et al. ClinGen—the clinical genome resource[J]. New England Journal of Medicine, 2015, 372(23): 2235-2242.
- Retrieve Source: https://clinicalgenome.org/ , 2024-03-27
- Brief Introduction: ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.
Field Name | Description |
---|---|
#Variation | |
ClinVar Variation Id | |
Allele Registry Id | GlinGen canonical allele identifier, example: CA200893 |
HGVS Expressions | |
HGNC Gene Symbol | |
Disease | |
Mondo Id | MonDO IDs are required for describing the disease entity in the ClinGen Gene and Variant Curation Interfaces |
Mode of Inheritance | a gene may also be associated with multiple inheritance patterns |
Assertion | Clinical Validity Classification |
Applied Evidence Codes (Met) | |
Applied Evidence Codes (Not Met) | |
Summary of interpretation | |
PubMed Articles | |
Expert Panel | |
Guideline | |
Approval Date | |
Published Date | |
Retracted | |
Evidence Repo Link | |
Uuid | |
HGVSg |
- Reference: Schaafsma G C P, Vihinen M. V ari SNP, a benchmark database for variations from db SNP[J]. Human mutation, 2015, 36(2): 161-166.
- Retrieve Source: https://lap676.srv.lu.se/VariSNP/index.php , 2017-02-16
- Brief Introduction: VariSNP is a benchmark database suite comprising variation datasets that can be used for developing and testing the performance of variant effect prediction tools. VariSNP contains datasets selected from dbSNP which were filtered for disease-related variants found in ClinVar, Swiss-Prot and PhenCode, so all variations are considered neutral or non-pathogenic.
Field Name | Description |
---|---|
dbSNP_id | dbSNP RefSNP cluster ID number (rs#) |
heterozygosity | Estimated average heterozygosity from allele frequencies of this RefSNP. Values between 0 and 1. You can find a document describing the computation of average heterozygosity and standard error for dbSNP RefSNP clusters at NCBI |
heterozygosity_standard_error | Standard error of heterozygosity estimate. |
creation_date | Date when the RefSNP cluster was instantiated |
creation_build | Date when the RefSNP cluster was instantiated |
update_date | Most recent date the RefSNP cluster was updated (member added or deleted) |
update_build | Build number (NCBI release) when the RefSNP cluster was updated |
observed_alleles | Observed variation alleles. All allele(s) observed at this position in the reference. Can be something like A/C or A/C/G/T or -/ACC |
asn_from | Start position of snp on contig, counting from 0. This position is always from the beginning of the contig regardless of the snp orientation to contig and regardless of the contig orienation to chromosome |
asn_to | End position of snp on contig |
reference_allele | Reference allele(s), this can be a '-' in the case of an insertion |
orientation | Orientation of RefSNP sequence to contig sequence. Values are 'forward' or 'reverse' |
minor_allele_frequency | Global minor allele frequency. dbSNP is reporting the minor allele frequency for each rs included in a default global population. Since this is being provided to distinguish common polymorphism from rare variants, the MAF is actually the second most frequent allele value. In other words, if there are 3 alleles, with frequencies of 0.50, 0.49, and 0.01, the MAF will be reported as 0.49. The current default global population is 1000Genome phase 1 genotype data from 1094 worldwide individuals, released in the May 2011 dataset. Values from 0 to 0.50 |
minor_allele | Minor allele |
sample_size | Sample size, which is the number of chromosomes in the sample population |
validation | Validation method, type of evidence used to confirm the variation. Present values can be byHapMap; byOtherPop; byFrequency; by1000G; by2Hit2Allele; byCluster |
hgvs_names | Description(s) of the variation according to HGVS recommendations |
allele_origin | Genetic origin of the allele, e.g. germline, somatic, inherited, maternal |
clinical_significance | Clinical significance. Assertions of clinical significance for alleles of human sequence variations are reported as provided by the submitter and not interpreted by NCBI. Submissions based on processing data from OMIM® were assigned the value of probable-pathogenic . If there is a published authoritative guideline about the pathogenicity of any allele, that is included in the report. The supported values are: unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other |
functional_class | Variation functional class. Variations are assigned functional classes, which report if a variation is located in a locus region, in a transcript, or in a coding region. This column contains one or more functional classes (fxnClass), values can be cds-indel, downstream-variant-500B, frameshift-variant, intron-variant, missense, nc-transcript-variant, reference, splice-acceptor-variant, splice-donor-variant, stop-gained, stop-lost, synonymous-codon, upstream-variant-2KB, utr-variant-3-prime. In this column you can also find values for a to the functional class corresponding Sequence Ontology term (soTerm), the mRNA accession (mrnaAcc) and version (mrnaVer), gene symbol (symbol) and the Entrez gene id (geneid) |
ncbi_gi | NCBI gi number. |
ncbi_accession | NCBI accession and version number of reference sequence, e.g. NG_01234.5 |
gene_symbol | Gene symbol (provided by HGNC). |
refseq_start_description | Description relative to transcription start on reference sequence |
coding_dna_description | Coding DNA variant description according to HGVS recommendations |
protein_description | Protein variant description according to HGVS recommendations |
coding_reference | NCBI RefSeq accession and version number (mRNA), e.g. NM_01234.5 |
protein_reference | NCBI RefSeq accession and version number (protein), e.g. NP_01234.5 |
- Reference: Wen P, Xiao P, Xia J. dbDSM: a manually curated database for deleterious synonymous mutations[J]. Bioinformatics, 2016, 32(12): 1914-1916.
- Retrieve Source: http://www.xialab.info:8080/dbDSM/index.jsp , v2
- Brief Introduction: dbDSM (Database of Deleterious Synonymous Mutation) is an integrated database that collect multiple sources relate to deleterious synonymous mutations.
Field Name | Description |
---|---|
dbDSM Number | The access number of a variant in dbDSM |
Disease | The main phenotype of the patient |
DOID | The identifier of a disease linked to OMIM database |
Gene | Gene name |
GeneID | The unique identifier for a gene |
MIM | The identifier of a gene linked to OMIM database |
Map Location | The map location for this gene |
Protein | A protein reference level representation of the variant |
cDNA | A coding reference level representation of the variant |
SNPID | dbSNP identifier of the variant. If there is no rs id this field is “n/a“ |
Refseq Transcript | Refseq Transcript that the variant resides on |
P-value | P-value in GWAS |
Strand | A variant occurred in forword chain(+) or reverse chain(-) |
GRCh38 Position | The position of variant on GRCh38 |
GRCh37 Position | The position of variant on GRCh37 |
Ref | Reference allele |
Alt | Alternate allele |
Year | Published time of an article |
PMID | Pubmed ID for an article |
Classification | Deleterious mechanism of a variant |
Strength of Evidence | Clinical classification of a variant |
Key Sentence | Deleterious evidence of a variant extracted from the article |
Source | The source of a variant |
Score | dbDSM score of a variant Which are including SilVA,DDIG-SN,FATHMM-MKL, TraP, CADD score.We use voting methods to evaluate the variant, dbDSM score plus one if the score above the threshold value for each tool. |
- Reference: Gong L, Whirl‐Carrillo M, Klein T E. PharmGKB, an integrated resource of pharmacogenomic knowledge[J]. Current protocols, 2021, 1(8): e226.
- Retrieve Source: https://www.pharmgkb.org/ , 2024-03-06
- Brief Introduction: The Pharmacogenomics Knowledgebase (PharmGKB) is an integrated online knowledge resource for the understanding of how genetic variation contributes to variation in drug response.
- var_pheno_ann.tsv: Contains associations in which the variant affects a phenotype, with or without drug information.
- var_drug_ann.tsv: Contains associations in which the variant affects a drug dose, response, metabolism, etc.
- var_fa_ann.tsv: Contains in vitro and functional analysis-type associations.
Field Name | Description |
---|---|
Variant Annotation ID | Unique ID number for each variant/drug annotation. |
Variant/Haplotypes | dbSNP rsID or haplotype(s) involved in the association. In some cases, an association is based on a gene phenotype group such as "poor metabolizers" or "intermediate activity". In these cases, the gene phenotype is found in this field. |
Gene | HGNC symbol for the gene involved in the association. Typically the variants will be within the gene boundaries, but occasionally this will not be true. E.g. the variant in the annotation may be upstream of the gene but is reported to affect the gene's expression or otherwise associated with the gene. |
Drug(s) | The drug(s) involved in the association. If there is more than one drug listed, the association may apply to each drug individually or the combination of the drugs together. The field "Multiple drugs And/or" will designate "or" - meaning that it applies to each drug - or "and" - meaning that the association is for the combination. |
PMID | PubMed identifier for the article supporting the annotation. |
Phenotype Category | Options are "efficacy", "toxicity", "dosage", "metabolism/PK", "PD", "other". |
Significance | The significance of the association as stated by the author; options are [yes, no, not stated]. |
Notes | Free text field for notes added by the curator. |
Sentence | The structured annotation sentence generated by the variant annotation tool based on the information entered by the curator. |
Alleles | The basis for comparison in the annotation. In this field, there may be a variant, one or more haplotypes grouped together, one or more genotypes grouped together or one or more diplotypes grouped together. If there is a gene phenotype in the "Variant/Haplotypes" field (described above), this field will be blank |
Specialty Population | Any special populations this annotation is relevant to (e.g. pediatric). |
Assay Type | Information about the type of assay performed. |
- Relationship
Field Name | Description |
---|---|
Entity1_id | Diseases, genes and drugs are designated by their PharmGKB IDs. |
Entity1_type | Disease, Drug, Gene, VariantLocation1 or Haplotype2. |
Entity2_id | Diseases, genes and drugs are designated by their PharmGKB IDs. |
Entity2_type | Disease, Drug, Gene, VariantLocation1 or Haplotype2. |
Evidence | VIP, VariantAnnotation, ClinicalAnnotation, DosingGuideline, DrugLabel or Pathway. Comma separated list because the evidence for a relationship could come from multiple sources in PharmGKB. |
Association | Possible values: “associated”, “not associated” or “ambiguous”. |
PK | PK stands for “Pharmacokinetic”. Relationships are marked as PK if the pair of entities was found in a pharmacokinetic pathway on PharmGKB, or if the Variant Annotation or VIP was annotated with PK in some manner |
PD | PD stands for “Pharmacodynamic”. Relationships are marked as PD if the pair of entities was found in a pharmacodynamic pathway on PharmGKB, or if the Variant Annotation or VIP was annotated with PD in some manner. |
PMIDs | PubMed IDs that were used to support the listed relationship. Semi-colon delimited list. |
- Clinical
Field Name | Description |
---|---|
variant | name or symbol of the variant |
gene | HGNC ID of the gene |
type | category or categories that the annotation falls in |
level of evidence | strength of evidence for the annotation |
chemicals | drug(s) associated with the variant in the annotation; from the PharmGKB drug vocabulary |
phenotypes | associated disease phenotype(s), where applicable |
- Variant
Field Name | Description | |
---|---|---|
Variant ID | The PharmGKB identifier for this variant | |
Variant Name | The PharmGKB name for this variant | |
Gene IDs | The PharmGKB identifiers for genes associated with this variant | |
Gene Symbols | The HGNC symbols for genes associated with this variant | |
Location | The location of this variation on a reference sequence (either RefSeq or GenBank), if available. HGVS format when applicable | |
Variant Annotation count | The count of Variant Annotations done on this variant | |
Clinical Annotation count | The count of all Clinical Annotations done on this variant | |
Level 1/2 Clinical Annotation count | The count of Level 1 or Level 2 ("top") Clinical Annotations done on this variant | |
Guideline Annotation count | The count of Dosing Guideline Annotations of which this variant is a part | |
Label Annotation count | The count of Drug Label Annotations in which this variant is mentioned | |
Synonyms | A comma-separated list of synonyms for this variant. Includes HGVS names, retired RSIDs, and other names |
- Reference: Davis C A, Hitz B C, Sloan C A, et al. The Encyclopedia of DNA elements (ENCODE): data portal update[J]. Nucleic acids research, 2018, 46(D1): D794-D801.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: Chemical modifications (e.g., methylation and acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription.
Field Name | Description |
---|---|
EncodeH3K4me1-sum | Sum of Encode H3K4me1 levels (from 13 cell lines) (default: 0.76) |
EncodeH3K4me1-max | Maximum Encode H3K4me1 level (from 13 cell lines) (default: 0.37) |
EncodeH3K4me2-sum | Sum of Encode H3K4me2 levels (from 14 cell lines) (default: 0.73) |
EncodeH3K4me2-max | Maximum Encode H3K4me2 level (from 14 cell lines) (default: 0.37) |
EncodeH3K4me3-sum | Sum of Encode H3K4me3 levels (from 14 cell lines) (default: 0.81) |
EncodeH3K4me3-max | Maximum Encode H3K4me3 level (from 14 cell lines) (default: 0.38) |
EncodeH3K9ac-sum | Sum of Encode H3K9ac levels (from 13 cell lines) (default: 0.82) |
EncodeH3K9ac-max | Maximum Encode H3K9ac level (from 13 cell lines) (default: 0.41) |
EncodeH3K9me3-sum | Sum of Encode H3K9me3 levels (from 14 cell lines) (default: 0.81) |
EncodeH3K9me3-max | Maximum Encode H3K9me3 level (from 14 cell lines) (default: 0.38) |
EncodeH3K27ac-sum | Sum of Encode H3K27ac levels (from 14 cell lines) (default: 0.74) |
EncodeH3K27ac-max | Maximum Encode H3K27ac level (from 14 cell lines) (default: 0.36) |
EncodeH3K27me3-sum | Sum of Encode H3K27me3 levels (from 14 cell lines) (default: 0.93) |
EncodeH3K27me3-max | Maximum Encode H3K27me3 level (from 14 cell lines) (default: 0.47) |
EncodeH3K36me3-sum | Sum of Encode H3K36me3 levels (from 10 cell lines) (default: 0.71) |
EncodeH3K36me3-max | Maximum Encode H3K36me3 level (from 10 cell lines) (default: 0.39) |
EncodeH3K79me2-sum | Sum of Encode H3K79me2 levels (from 13 cell lines) (default: 0.64) |
EncodeH3K79me2-max | Maximum Encode H3K79me2 level (from 13 cell lines) (default: 0.34) |
EncodeH4K20me1-sum | Sum of Encode H4K20me1 levels (from 11 cell lines) (default: 0.88) |
EncodeH4K20me1-max | Maximum Encode H4K20me1 level (from 11 cell lines) (default: 0.47) |
EncodeH2AFZ-sum | Sum of Encode H2AFZ levels (from 13 cell lines) (default: 0.9) |
EncodeH2AFZ-max | Maximum Encode H2AFZ level (from 13 cell lines) (default: 0.42) |
EncodeDNase-sum | Sum of Encode DNase-seq levels (from 12 cell lines) (default: 0.0) |
EncodeDNase-max | Maximum Encode DNase-seq level (from 12 cell lines) (default: 0.0) |
EncodetotalRNA-sum | Sum of Encode totalRNA-seq levels (from 10 cell lines always minus and plus strand) (default: 0.0) |
EncodetotalRNA-max | Maximum Encode totalRNA-seq level (from 10 cell lines, minus and plus strand separately) (default: 0.0) |
- Reference: Ernst J, Kellis M. Chromatin-state discovery and genome annotation with ChromHMM[J]. Nature protocols, 2017, 12(12): 2478-2492.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: ChromHMM annotates the noncoding genome using epigenomic data across multiple cell types by employing a multivariate hidden Markov model to infer chromatin-state signatures, generating genome-wide annotations and facilitating functional interpretations through automated enrichment analysis.
Field Name | Description |
---|---|
cHmm_E1 | Number of 48 cell types in chromHMM state E1_poised (default: 1.92*) |
cHmm_E2 | Number of 48 cell types in chromHMM state E2_repressed (default: 1.92) |
cHmm_E3 | Number of 48 cell types in chromHMM state E3_dead (default: 1.92) |
cHmm_E4 | Number of 48 cell types in chromHMM state E4_dead (default: 1.92) |
cHmm_E5 | Number of 48 cell types in chromHMM state E5_repressed (default: 1.92) |
cHmm_E6 | Number of 48 cell types in chromHMM state E6_repressed (default: 1.92) |
cHmm_E7 | Number of 48 cell types in chromHMM state E7_weak (default: 1.92) |
cHmm_E8 | Number of 48 cell types in chromHMM state E8_gene (default: 1.92) |
cHmm_E9 | Number of 48 cell types in chromHMM state E9_gene (default: 1.92) |
cHmm_E10 | Number of 48 cell types in chromHMM state E10_gene (default: 1.92) |
cHmm_E11 | Number of 48 cell types in chromHMM state E11_gene (default: 1.92) |
cHmm_E12 | Number of 48 cell types in chromHMM state E12_distal (default: 1.92) |
cHmm_E13 | Number of 48 cell types in chromHMM state E13_distal (default: 1.92) |
cHmm_E14 | Number of 48 cell types in chromHMM state E14_distal (default: 1.92) |
cHmm_E15 | Number of 48 cell types in chromHMM state E15_weak (default: 1.92) |
cHmm_E16 | Number of 48 cell types in chromHMM state E16_tss (default: 1.92) |
cHmm_E17 | Number of 48 cell types in chromHMM state E17_proximal (default: 1.92) |
cHmm_E18 | Number of 48 cell types in chromHMM state E18_proximal (default: 1.92) |
cHmm_E19 | Number of 48 cell types in chromHMM state E19_tss (default: 1.92) |
cHmm_E20 | Number of 48 cell types in chromHMM state E20_poised (default: 1.92) |
cHmm_E21 | Number of 48 cell types in chromHMM state E21_dead (default: 1.92) |
cHmm_E22 | Number of 48 cell types in chromHMM state E22_repressed (default: 1.92) |
cHmm_E23 | Number of 48 cell types in chromHMM state E23_weak (default: 1.92) |
cHmm_E24 | Number of 48 cell types in chromHMM state E24_distal (default: 1.92) |
cHmm_E25 | Number of 48 cell types in chromHMM state E25_distal (default: 1.92) |
ORegAnno的网页失效了,提供该数据的WGSA也只给出了如下两个字段的描述。
- Reference: Lesurf, R. et al. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res. 44, D126-132 (2016).
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction: The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements.
Field Name | Description |
---|---|
#Chrom | |
Start | |
End | |
ORegAnno_ID | |
Species | |
Outcome | |
Type | The type of regulatory region by ORegAnno |
Gene_Symbol | |
Gene_ID | |
Gene_Source | |
Regulatory_Element_Symbol | |
Regulatory_Element_ID | |
Regulatory_Element_Source | |
dbSNP_ID | |
PMID | The PMID of the paper describing the regulation |
Dataset | |
Build | |
Strand |
- Reference:Schmiedel B J, Singh D, Madrigal A, et al. Impact of genetic polymorphisms on human immune cell gene expression[J]. Cell, 2018, 175(6): 1701-1715. e16.
- Retrieve Source: https://dice-database.org/downloads , 2.23.2022
- Brief Introduction: The DICE project aims to elucidate the role of common genetic variations in human disease by creating reference transcriptomic and epigenomic maps of immune cells, identifying functional SNPs affecting gene expression, and investigating regulatory mechanisms and cell-type specific effects, including those influenced by sex, to reveal how disease risk-associated polymorphisms impact pathogenesis.
Field Name | Description |
---|---|
DICE_rs_ID | dbSNP rsID |
DICE_FILTER | Filter status |
DICE_Cell_Type | Different cell type reported in DICE |
DICE_Gene | Ensembl ID |
DICE_GeneSymbol | Gene symbol |
DICE_Pvalue | Pvalue |
DICE_Beta | The beta value indicates if expression for the alt allele is higher (if beta is positive) or lower (if beta is negative) |
- Reference: Lappalainen T, Sammeth M, Friedländer M R, et al. Transcriptome and genome sequencing uncovers functional variation in humans[J]. Nature, 2013, 501(7468): 506-511.
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction: Geuvadis is the first uniformly processed RNA-seq data from 462 individuals across multiple populations, revealing extensive genetic variation in gene regulation and providing insights into causal regulatory mechanisms and disease-associated loci.
Field Name | Description |
---|---|
Geuvadis_eQTL_target_gene | Ensembl gene ID of the eQTL associated with, from the Geuvadis project |
- Reference: Lonsdale, J., Thomas, J., Salvatore, M. et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–585 (2013).
- Retrieve Source: https://storage.googleapis.com/adult-gtex/bulk-qtl/v8/single-tissue-cis-qtl/GTEx_Analysis_v8_eQTL.tar
- Brief Introduction: The Genotype-Tissue Expression (GTEx) project aims to create a resource database and tissue bank to study the relationship between genetic variation and gene expression in human tissues.
Field Name | Description |
---|---|
variant_id | variant ID in the format {chr}_{pos}_\ref_base}_{ref_seq}/{alt_seq} |
gene_id | GENCODE/Ensembl gene ID |
tss_distance | distance between variant and transcription start site. Positive when variant is downstream of the TSS, negative otherwise |
ma_samples | number of samples carrying the minor allele |
ma_count | total number of minor alleles across individuals |
maf | minor allele frequency observed in the set of donors for a given tissue |
pval_nominal | nominal p-value threshold for calling a variant-gene pair significant for the gene |
slope | regression slope |
slope_se | standard error of the regression slope |
pval_nominal_threshold | nominal p-value threshold for calling a variant-gene pair significant for the gene |
min_pval_nominal | smallest nominal p-value for the gene |
pval_beta | beta-approximated permutation p-value for the gene |
tissue_type | Different human tissuses in GTEx |
- Reference: Rentzsch P, Witten D, Cooper G M, et al. CADD: predicting the deleteriousness of variants throughout the human genome[J]. Nucleic Acids Research, 2019, 47(D1): D886-D894.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: Transcription-factor-related information retrieved from CADD v1.7.
Field Name | Description |
---|---|
RemapOverlapTF | Remap number of different transcription factors binding (default: -0.5) |
RemapOverlapCL | Remap number of different transcription factor - cell line combinations binding (default: -0.5) |
- Reference: Fishilevich S, Nudel R, Rappaport N, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards[J]. Database, 2017, 2017: bax028.
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction: GeneHancer predictions are fully integrated in the widely used GeneCards Suite, whereby candidate enhancers and their annotations are displayed on every relevant GeneCard.
Field Name | Description |
---|---|
GeneHancer | Predicted human enhancer sites from the GeneHancer database. |
- Reference: Hnisz D, Abraham B J, Lee T I, et al. Super-enhancers in the control of cell identity and disease[J]. Cell, 2013, 155(4): 934-947.
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction: Super-enhancers produce a catalog of super-enhancers in a broad range of human cell types and find that super-enhancers associate with genes that control and define the biology of these cells.
Field Name | Description |
---|---|
Super Enhancer | Predicted super-enhancer sites and targets in a range of human cell types. |
- Reference: Erwin G D, Oksenberg N, Truty R M, et al. Integrating diverse datasets improves developmental enhancer prediction[J]. PLoS computational biology, 2014, 10(6): e1003677.
- Retrieve Source: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003677#references
- Brief Introduction: EnhancerFinder integrates DNA sequence motifs, evolutionary patterns, and functional genomics data to predict developmental enhancers and their tissue specificity, which outperforms single-data approaches, identifies 84,301 enhancers genome-wide, and provides functional annotations enriched near relevant genes and GWAS lead SNPs, with predictions validated in vivo and available as a UCSC Genome Browser track.
Field Name | Description |
---|---|
Enhancer_Finder_General_Prediction_MKL_Scores | Whether the site is within a predicted general developmental enhancers, along with MKL scores. |
Enhancer_Finder_General_Prediction_H3K27ac_H3K4me1_Contexts | The H3K27ac and H3K4me1 marks from the feature data overlapping each predicted enhancer. |
Enhancer_Finder_Limb_MKL_Scores | Whether the site is within a predicted limb tissuse-specificity enhancers, along with MKL scores. |
Enhancer_Finder_Brain_MKL_Scores | Whether the site is within a predicted brain tissuse-specificity enhancers, along with MKL scores. |
Enhancer_Finder_Heart_MKL_Scores | Whether the site is within a predicted heart tissuse-specificity enhancers, along with MKL scores. |
- Reference: The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas[J]. Nature, 2014, 507(7493): 462-470.
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction: Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) in human and mouse cells, revealing few 'housekeeping' genes, many composite promoters with cell-type-specific TSSs, and differing evolutionary rates for TSSs, linking key transcription factors to cell states, with the FANTOM5 project providing comprehensive mammalian cell-type-specific transcriptome profiles for biomedical research.
Field Name | Description |
---|---|
cage_promoter | CAGE defined promoter sites from Fantom 5 |
cage_tc | CAGE tag cluster |
- Reference: Andersson R, Gebhard C, Miguel-Escalada I, et al. An atlas of active enhancers across human cell types and tissues[J]. Nature, 2014, 507(7493): 455-461.
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction: CAGE Enhancer utilizes the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers.
Field Name | Description |
---|---|
cage_enhancer | CAGE defined permissive Enhancer sites from Fantom 5 |
- Reference:
- miRBase: Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014).
- snoRNABase: Lestrade, L. & Weber, M. J. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 34, D158-162 (2006).
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction:
- miRBase: The miRBase database contains 24,521 microRNA loci from 206 species and includes a high-confidence subset based on deep sequencing data.
- snoRNABase: The snoRNA-LBME-db is an online database containing experimentally verified and predicted human C/D box and H/ACA box snoRNAs, and scaRNAs, which guide RNA modifications and maturation, providing detailed annotations, predicted base pairings.
Field Name | Description |
---|---|
sno_miRNA_name | The name of snoRNA or miRNA if the site is located within (from miRBase/snoRNABase) |
sno_miRNA_type | the type of snoRNA or miRNA (from miRBase/snoRNABase) |
- Reference: Chen S, Francioli L C, Goodrich J K, et al. A genomic mutational constraint map using variation in 76,156 human genomes[J]. Nature, 2024, 625(7993): 92-100.
- Retrieve Source: https://gnomad.broadinstitute.org/news/2023-11-gnomad-v4-0/
- Brief Introduction: The gnomAD database is composed of exome and genome sequences from around the world. We have removed cohorts that were recruited for pediatric disease, except for a small number of diverse cohorts where we have included unaffected relatives.
Field Name | Description |
---|---|
exomes_AF | Exomes Alternate allele frequency |
exomes_AF_XX | Exomes Alternate allele frequency in XX samples |
exomes_AF_XY | Exomes Alternate allele frequency in XY samples |
exomes_AF_afr_XX | Exomes Alternate allele count for XX samples of African/African-American ancestry |
exomes_AF_afr_XY | Exomes Alternate allele count for XYsamples of African/African-American ancestry |
exomes_AF_afr | Exomes Alternate allele frequency in samples of African/African-American ancestry |
exomes_AF_amr_XX | Exomes Alternate allele frequency in XX samples of Latino ancestry |
exomes_AF_amr_XY | Exomes Alternate allele frequency in XY samples of Latino ancestry |
exomes_AF_amr | Exomes Alternate allele frequency in samples of Latino ancestry |
exomes_AF_asj_XX | Exomes Alternate allele frequency in XX samples of Ashkenazi Jewish ancestry |
exomes_AF_asj_XY | Exomes Alternate allele frequency in XY samples of Ashkenazi Jewish ancestry |
exomes_AF_asj | Exomes Alternate allele frequency in samples of Ashkenazi Jewish ancestry |
exomes_AF_eas_XX | Exomes Alternate allele frequency in XX samples of East Asian ancestry |
exomes_AF_eas_XY | Exomes Alternate allele frequency in XY samples of East Asian ancestry |
exomes_AF_eas | Exomes Alternate allele frequency in samples of East Asian ancestry |
exomes_AF_fin_XX | Exomes Alternate allele frequency in XX samples of Finnish ancestry |
exomes_AF_fin_XY | Exomes Alternate allele frequency in XY samples of Finnish ancestry |
exomes_AF_fin | Exomes Alternate allele frequency in samples of Finnish ancestry |
exomes_AF_mid_XX | Exomes Alternate allele frequency in XX samples of Middle Eastern ancestry |
exomes_AF_mid_XY | Exomes Alternate allele frequency in XY samples of Middle Eastern ancestry |
exomes_AF_mid | Exomes Alternate allele frequency in samples of Middle Eastern ancestry |
exomes_AF_nfe_XX | Exomes Alternate allele frequency in XX samples of Non-Finnish European ancestry |
exomes_AF_nfe_XY | Exomes Alternate allele frequency in XY samples of Non-Finnish European ancestry |
exomes_AF_nfe | Exomes Alternate allele frequency in samples of Non-Finnish European ancestry |
genomes_AF | Genomes Alternate allele frequency |
genomes_AF_XX | Genomes Alternate allele frequency in XX samples |
genomes_AF_XY | Genomes Alternate allele frequency in XY samples |
genomes_AF_afr_XX | Genomes Alternate allele count for XX samples of African/African-American ancestry |
genomes_AF_afr_XY | Genomes Alternate allele count for XYsamples of African/African-American ancestry |
genomes_AF_afr | Genomes Alternate allele frequency in samples of African/African-American ancestry |
genomes_AF_amr_XX | Genomes Alternate allele frequency in XX samples of Latino ancestry |
genomes_AF_amr_XY | Genomes Alternate allele frequency in XY samples of Latino ancestry |
genomes_AF_amr | Genomes Alternate allele frequency in samples of Latino ancestry |
genomes_AF_asj_XX | Genomes Alternate allele frequency in XX samples of Ashkenazi Jewish ancestry |
genomes_AF_asj_XY | Genomes Alternate allele frequency in XY samples of Ashkenazi Jewish ancestry |
genomes_AF_asj | Genomes Alternate allele frequency in samples of Ashkenazi Jewish ancestry |
genomes_AF_eas_XX | Genomes Alternate allele frequency in XX samples of East Asian ancestry |
genomes_AF_eas_XY | Genomes Alternate allele frequency in XY samples of East Asian ancestry |
genomes_AF_eas | Genomes Alternate allele frequency in samples of East Asian ancestry |
genomes_AF_fin_XX | Genomes Alternate allele frequency in XX samples of Finnish ancestry |
genomes_AF_fin_XY | Genomes Alternate allele frequency in XY samples of Finnish ancestry |
genomes_AF_fin | Genomes Alternate allele frequency in samples of Finnish ancestry |
genomes_AF_mid_XX | Genomes Alternate allele frequency in XX samples of Middle Eastern ancestry |
genomes_AF_mid_XY | Genomes Alternate allele frequency in XY samples of Middle Eastern ancestry |
genomes_AF_mid | Genomes Alternate allele frequency in samples of Middle Eastern ancestry |
genomes_AF_nfe_XX | Genomes Alternate allele frequency in XX samples of Non-Finnish European ancestry |
genomes_AF_nfe_XY | Genomes Alternate allele frequency in XY samples of Non-Finnish European ancestry |
genomes_AF_nfe | Genomes Alternate allele frequency in samples of Non-Finnish European ancestry |
- Reference: Statistics group Ciampi Antonio 8 Greenwood Celia MT (co-chair) 7 8 14 19 Hendricks Audrey E. 1 12 Li Rui 7 13 14 Metrustry Sarah 5 Oualkacha Karim 80 Tachmazidou Ioanna 1 Xu ChangJiang 7 8 Zeggini Eleftheria (co-chair) 1. The UK10K project identifies rare variants in health and disease[J]. Nature, 2015, 526(7571): 82-90.
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction: The UK10K project will enable researchers in the UK and beyond to better understand the link between low-frequency and rare genetic changes, and human disease caused by harmful changes to the proteins the body makes.
Field Name | Description |
---|---|
RS_ID | dbSNP ID. |
DP | - |
VQSLOD | - |
AC | Alternative allele count in called genotypes in UK10K cohorts. |
AN | Total allele count in called genotypes in UK10K cohorts. |
AF | Alternative allele frequency in called genotypes in UK10K cohorts. |
AC_TWINSUK | Alternative allele count in called genotypes in UK10K TWINSUK cohort. |
AN_TWINSUK | Total allele count in called genotypes in UK10K TWINSUK cohort. |
AF_TWINSUK | Alternative allele frequency in called genotypes in UK10K TWINSUK cohort. |
AC_ALSPAC | Alternative allele count in called genotypes in UK10K TWINSUK cohort. |
AN_ALSPAC | Total allele count in called genotypes in UK10K TWINSUK cohort. |
AF_ALSPAC | Alternative allele frequency in called genotypes in UK10K TWINSUK cohort. |
AF_AFR | - |
AF_AMR | - |
AF_ASN | - |
AF_EUR | - |
AF_MAX | - |
ESP_MAF | - |
CSQ | Conseqence of given variant. e.g. ENST00000342066:SAMD11:synonymous_variant:21:7:Q>Q |
- Reference: Karczewski K J, Weisburd B, Thomas B, et al. The ExAC browser: displaying reference data information from over 60 000 exomes[J]. Nucleic acids research, 2017, 45(D1): D840-D845.
- Retrieve Source:
- ExAC, ExAC_nonpsych are retrieved from https://annovar.openbioinformatics.org/en/latest/
- ExAC_nonTCGA is retrieved from https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction:
Field Name | Description |
---|---|
ExAC_ALL | Allele frequency in total ExAC samples |
ExAC_AFR | Allele frequency in African & African American ExAC samples |
ExAC_AMR | Allele frequency in American ExAC samples |
ExAC_EAS | Allele frequency in East Asian ExAC samples |
ExAC_FIN | Allele frequency in Finnish ExAC samples |
ExAC_NFE | Allele frequency in Non-Finnish European ExAC samples |
ExAC_OTH | Allele frequency in other ExAC samples |
ExAC_SAS | Allele frequency in South Asian ExAC samples |
ExAC_nonpsych_ALL | Allele frequency in total ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_AFR | Allele frequency in African & African American ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_AMR | Allele frequency in American ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_EAS | Allele frequency in East Asian ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_FIN | Allele frequency in Finnish ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_NFE | Allele frequency in Non-Finnish European ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_OTH | Allele frequency in other ExAC samples excluding psychiatric cohorts |
ExAC_nonpsych_SAS | Allele frequency in South Asian ExAC samples excluding psychiatric cohorts |
ExAC_nonTCGA_QUAL | Phred-scaled quality score for the assertion made in ALT |
ExAC_nonTCGA_FILTER | PASS if this position has passed all filters |
ExAC_nonTCGA_ALL | Allele frequency in total ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_AFR | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_AMR | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_EAS | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_FIN | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_NFE | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC samples excluding TCGA cohorts |
ExAC_nonTCGA_Adj | Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC samples excluding TCGA cohorts |
- Reference: Glusman G, Caballero J, Mauldin D E, et al. Kaviar: an accessible system for testing SNV novelty[J]. Bioinformatics, 2011, 27(22): 3216-3217.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction:
Field Name | Description |
---|---|
Kaviar_AF | |
Kaviar_AC | |
Kaviar_AN |
- Reference: Scott E M, Halees A, Itan Y, et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery[J]. Nature genetics, 2016, 48(9): 1071-1076.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction:
Field Name | Description |
---|---|
GME_AF | |
GME_NWA | |
GME_NEA | |
GME_AP | |
GME_Israel | |
GME_SD | |
GME_TP | |
GME_CA |
- Reference: Reinhold W C, Varma S, Sousa F, et al. NCI-60 whole exome sequencing and pharmacological CellMiner analyses[J]. PloS one, 2014, 9(7): e101670.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction:
Field Name | Description |
---|---|
NCI60_AF |
- Reference: Naslavsky M S, Yamamoto G L, de Almeida T F, et al. Exomic variants of an elderly cohort of Brazilians in the ABraOM database[J]. Human mutation, 2017, 38(7): 751-763.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction:
Field Name | Description |
---|---|
ABRAOM_AF | |
ABRAOM_Filter | |
ABRAOM_Cegh_Filter |
- Reference: Fu W, O' connor T D, Jun G, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants[J]. Nature, 2013, 493(7431): 216-220.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction:
Field Name | Description |
---|---|
esp6500siv2_all | |
esp6500siv2_aa | |
esp6500siv2_ea |
- Reference: Taliun D, Harris D N, Kessler M D, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program[J]. Nature, 2021, 590(7845): 290-299.
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction:
Field Name | Description |
---|---|
bravo_an | TOPMed Bravo Genome Allele Number. |
bravo_af | TOPMed Bravo Genome Allele Frequency. |
filter_status | TOPMed QC status of the given variant. |
5 Primates Allele Frequency utilized in AlphaMissense, we have mapped them on GRCh38.
Field Name | Reference | Retrieve Source |
---|---|---|
Bonobos_AF | Genetic variation in Pan species is shaped by demographic history and harbors lineage-specific functions[J]. Genome biology and evolution, 2019, 11(4): 1178-1191. | https://figshare.com/articles/dataset/Han_etal_Data_tsv_gz/7855850 |
Gorilla_AF | Great ape genetic diversity and population history[J]. Nature, 2013, 499(7459): 471-475. | https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Gorilla.vcf.gz |
Pan_troglodytes_AF | Same as above | https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Pan_troglodytes.vcf.gz |
Pongo_pygmaeus_AF | Same as above | https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Pongo_abelii.vcf.gz |
Pongo_abelii_AF | Same as above | https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Pongo_pygmaeus.vcf.gz |
- Reference: Garber M, Guttman M, Clamp M, et al. Identifying novel constrained elements by exploiting biased substitution patterns[J]. Bioinformatics, 2009, 25(12): i54-i62.
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction: siPhy leverages deeply sequenced clades to identify evolutionary selection by detecting both rate-based conservation and substitution patterns indicative of natural selection, employing a statistical method for biased nucleotide substitutions, a learning algorithm to infer site-specific biases from sequence alignments, and a hidden Markov model to detect constrained elements.
Field Name | Description |
---|---|
siPhy_rankscore | The rank of the SiPhy_29way_logOdds score among all SiPhy_29way_logOdds scores in genome |
- Reference: McVicker G, Gordon D, Davis C, et al.Widespread Genomic Signatures of Natural Selection in Hominid Evolution [J]. PLoS genetics, 2009, 5(5): e1000471.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: Selection on genomic functional elements can be detected by its effects on population diversity at linked neutral sites, as shown by our analysis of human polymorphisms and sequence differences among five primate species relative to conserved sequence features.
Field Name | Description |
---|---|
bStatistic | Background selection (B) value estimatation. Ranges from 0 to 1000. It estimates the expected fraction (1000) of neutral diversity present at a site. Values close to 0 represent near complete removal of diversity as a result of background selection and values near 1000 indicating absent of background selection. |
- Reference: Gulko B, Melissa J. Hubisz, Gronau I, Siepel A (2015). Probabilities of fitness consequences for point mutations across the human genome. Nature Genetics, 47, 276-283.
- Retrieve Source: https://sites.google.com/site/jpopgen/wgsa
- Brief Introduction: FitCons, a novel computational method, estimates the probability that a point mutation at each genome position will influence fitness, using high-throughput functional genomic data to cluster genomic positions and assess fitness consequences.
Field Name | Description |
---|---|
integrated_fitCons_score | FitCons scores (i6) based on function evidence from multiple cell types, the higher the score the more potential for interesting genomic function |
integrated_confidence_value | Confidence value for the integrated_fitCons_score: 0 - High confidence values (p<~.003), 1 - Likely Significant (p<.05), 2 - Likely Informative (p<.25), 3 - Best estimate (p>=.25) |
GM12878_fitCons_score | FitCons scores (gm) based on function evidence from the GM12878 cell type, the higher the score the more potential for interesting genomic function |
GM12878_confidence_value | Confidence value for the GM12878_fitCons_score: 0 - High confidence values (p<~.003), 1 - Likely Significant (p<.05), 2 - Likely Informative (p<.25), 3 - Best estimate (p>=.25) |
H1-hESC_fitCons_score | FitCons scores (h1) based on function evidence from the H1-hESC cell type, the higher the score the more potential for interesting genomic function |
H1-hESC_confidence_value | Rank of the H1-hESC_fitCons_score among all H1-hESC_fitCons_scores in genome |
HUVEC_fitCons_score | FitCons scores (hu) based on function evidence from the HUVEC cell type, the higher the score the more potential for interesting genomic function |
HUVEC_confidence_value | confidence value for the HUVEC_fitCons_score: 0 - High confidence values (p<~.003), 1 - Likely Significant (p<.05), 2 - Likely Informative (p<.25), 3 - Best estimate (p>=.25) |
integrated_fitCons_score_rankscore | Rank of the integrated_fitCons_score among all integrated_fitCons_scores in genome |
GM12878_fitCons_score_rankscore | Rank of the GM12878_fitCons_score among all GM12878_fitCons_scores in genome |
H1-hESC_fitCons_score_rankscore | Confidence value for the H1-hESC_fitCons_score: 0 - High confidence values (p<~.003), 1 - Likely Significant (p<.05), 2 - Likely Informative (p<.25), 3 - Best estimate (p>=.25) |
HUVEC_fitCons_score_rankscore | Rank of the HUVEC_fitCons_score among all HUVEC_fitCons_scores in genome |
- Reference: Siepel A, Bejerano G, Pedersen J S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes[J]. Genome research, 2005, 15(8): 1034-1050.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: PhastCons, a program based on a two-state phylogenetic hidden Markov model, was used to conduct a comprehensive search for conserved elements across vertebrate genomes, utilizing genome-wide alignments of five vertebrate species, four insect species, two Caenorhabditis species, and seven Saccharomyces species.
Field Name | Description |
---|---|
priPhCons | Primate PhastCons conservation score (excl. human) (default: 0.0) |
mamPhCons | Mammalian PhastCons conservation score (excl. human) (default: 0.0) |
verPhCons | Vertebrate PhastCons conservation score (excl. human) (default: 0.0) |
- Reference: Pollard K S, Hubisz M J, Rosenbloom K R, et al. Detection of nonneutral substitution rates on mammalian phylogenies[J]. Genome research, 2010, 20(1): 110-121.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: PhyloP addresses the broader problem of detecting departures from neutral nucleotide substitution rates in either direction, potentially in a clade-specific manner, using four statistical tests (likelihood ratio, score, exact distributions, GERP).
Field Name | Description |
---|---|
priPhyloP | Primate PhyloP score (excl. human) (default: -0.029) |
mamPhyloP | Mammalian PhyloP score (excl. human) (default: - 0.005) |
verPhyloP | Vertebrate PhyloP score (excl. human) (default: 0.042) |
- Reference: Davydov E V, Goode D L, Sirota M, et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++[J]. PLoS computational biology, 2010, 6(12): e1001025.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: GERP++ uses maximum likelihood evolutionary rate estimation for position-specific scoring. In contrast to previous bottom-up methods, it employs a novel dynamic programming approach to subsequently define constrained elements.
Field Name | Description |
---|---|
GerpRS | Gerp element score (default: 0) |
GerpRSpval | Gerp element p-Value (default: 0) |
GerpN | Neutral evolution score defined by GERP++ (default: 3.0) |
GerpS | Rejected Substitution score defined by GERP++ (default: -0.2) |
- Reference: Christmas M J, Kaplow I M, Genereux D P, et al. Evolutionary constraint and innovation across hundreds of placental mammals[J]. Science, 2023, 380(6643): eabn3943.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction: Zoonomia, the largest comparative genomics resource for mammals, aligns genomes of 240 species to identify bases likely affecting fitness and disease risk, revealing 332 million evolutionarily constrained bases in the human genome, with many outside protein-coding exons, and associating changes in genes and regulatory elements with unique mammalian traits that could inform therapeutic development.
Field Name | Description |
---|---|
ZooPriPhyloP | Zoonomia Primate PhyloP conservation score (43 genomes) (default: 0.005) |
ZooVerPhyloP | Zoonomia Vertebrate PhyloP conservation score (241 vertebrate genome) (default: -0.1460) |
ZooRoCC | Zoonomia Runs of Contiguous Constraint (default: 0) |
ZooUCE | Zoonomia UltraConserved Elements (default: 0) |
De novo mutations (DNMs) are defined as variants observed in individuals that are not seen in either parent and these types of variants have been reported to play prominent roles in several genetic diseases.
Gene4Denovo的网页、文献中没有对如下字段名的描述信息,该注释来自ANNOVAR
- Reference: Zhao G, Li K, Li B, et al. Gene4Denovo: an integrated database and analytic platform for de novo mutations in humans[J]. Nucleic acids research, 2020, 48(D1): D913-D926.
- Retrieve Source: https://annovar.openbioinformatics.org/en/latest/
- Brief Introduction: Gene4Denovo integrated 580 799 DNMs, including 30 060 coding DNMs detected by WES/WGS from 23 951 individuals across 24 phenotypes and prioritized a list of candidate genes with different degrees of statistical evidence, including 346 genes with false discovery rates <0.05.
Field Name | Description |
---|---|
DN ID | The variants identifer of Gene4Denovo, such as dn65354. |
Patient ID | |
Phenotype | Annotated information about gene function according to OMIM, ClinVar, denovo-db, MGI, HPO. |
Platform | |
Study | |
Pubmed ID |
- Reference: Turner T N, Yi Q, Krumm N, et al. denovo-db: A compendium of human de novo variants[J]. Nucleic acids research, 2017, 45(D1): D804-D811.
- Retrieve Source: https://denovo-db.gs.washington.edu/denovo-db/, we only retrieved non-SSC Samples due to terms of use of denovo-db.
- Brief Introduction: denovo-db contained 40 different studies and 32,991 de novo variants from 23,098 trios.
Field Name | Description |
---|---|
SAMPLE_CT | Observed Sample Count |
NumProbands | The total number of probands involved in the study. |
SampleIDs | If some type of sample identifier is given in the study we use that exactly. If there is no sample identifier we use the name of the study and start numbering such that every variant has a unique sample identifier. |
SequenceType | The sequence type used in the study. |
Validation | The validation status describes the result of some orthogonal validation method (for example Sanger sequencing). The values are either yes or unknown meaning either valid or not known, respectively. Any variants that are not valid are removed early in the pipeline and are not represented in denovo-db. |
PrimaryPhenotype | he primary phenotype is the main phenotype of the patient for inclusion in the study. |
StudyName | This is the name of the study. |
PubmedId | Pubmed ID for the study publication. |
NumControls | The total number of controls involved in the study. |
- Reference: Gazal, S., Finucane, H., Furlotte, N. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat Genet 49, 1421–1427 (2017).
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction:
Field Name | Description |
---|---|
nucdiv | Nuclear diversity measures the probability of how likely the region diversify. Range: [0.05, 60.25] (default: 0). |
recombination_rate | Recombination rate measures the probability of how likely the region tends to undergo recombination. Range: [0, 54.96] |
- Reference: Mehran Karimzadeh, Carl Ernst, Anshul Kundaje, Michael M Hoffman, Umap and Bismap: quantifying genome and methylome mappability, Nucleic Acids Research, Volume 46, Issue 20, 16 November 2018, Page e120
- Retrieve Source: https://favor.genohub.org/
- Brief Introduction:
Field Name | Description |
---|---|
k*_bismap | Mappability of the bisulfite-converted genome. Bisulfite sequencing approaches used to identify DNA methylation introduce large numbers of reads that map to multiple regions. This annotation identifies mappability of the bisulfite-converted genome. Range: [0, 1] (default: 0). |
k*_umap | Mappability of unconverted genome. It measures the extent to which a position can be uniquely mapped by sequence reads. Lower mappability means the estimates of genomic and epigenomic characteristics from sequencing assays are less reliable, and the region has increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Range: [0, 1] (default: 0). |
- Reference: Rentzsch P, Witten D, Cooper G M, et al. CADD: predicting the deleteriousness of variants throughout the human genome[J]. Nucleic Acids Research, 2019, 47(D1): D886-D894.
- Retrieve Source: https://cadd.gs.washington.edu/download
- Brief Introduction:
Field Name | Description |
---|---|
Dist2Mutation | Distance between the closest BRAVO SNV up and downstream (position itself excluded) (default: 0*) |
Freq100bp | Number of frequent (MAF > 0.05) BRAVO SNV in 100 bp window nearby (default: 0) |
Rare100bp | Number of rare (MAF < 0.05) BRAVO SNV in 100 bp window nearby (default: 0) |
Sngl100bp | Number of single occurrence BRAVO SNV in 100 bp window nearby (default: 0) |
Freq1000bp | Number of frequent (MAF > 0.05) BRAVO SNV in 1000 bp window nearby (default: 0) |
Rare1000bp | Number of rare (MAF < 0.05) BRAVO SNV in 1000 bp window nearby (default: 0) |
Sngl1000bp | Number of single occurrence BRAVO SNV in 1000 bp window nearby (default: 0) |
Freq10000bp | Number of frequent (MAF > 0.05) BRAVO SNV in 10000 bp window nearby (default: 0) |
Rare10000bp | Number of rare (MAF < 0.05) BRAVO SNV in 10000 bp window nearby (default: 0) |
Sngl10000bp | Number of single occurrence BRAVO SNV in 10000 bp window nearby (default: 0) |
We further collected sSNV from Ensembl Variation 112 and mapped them on GRCh38.
Field Name | Description |
---|---|
species_chromosome | Chromosome of this variant |
species_position | Position of this variant |
rs_id | dbSNP rsID |
reference_allele | Reference allele of this variant |
alternate_allele | Alternate allele of this variant |
evidence_status | Support evidence of this variant, see details in https://www.ensembl.org/info/genome/variation/prediction/variant_quality.html#evidence_status |
original_source | The original source this variant comes from. |
RefPep | Amino acid translated with reference allele. |
VarPep | Variant peptide that is translated as a result of a missense variant. Format=Index|Amino_acid|Feature_id. The index identifies the missense variant. The amino acid translated with the missense variant. The feature id for the feature overlapping the variant. |
VE | Variant effect of a variant overlapping a sequence feature as computed by the ensembl variant effect pipeline. Format=Consequence|Index|Feature_type|Feature_id. Index indentifies for which variant sequence the effect is described for. |
CSQ | Consequence annotations from Ensembl's Variant Effect Pipeline. Format=Allele|Consequence|Feature_type|Feature|Amino_acids|SIFT |
ensembl_transcript_id | Transcript information of this variant. |
species_variant | Variant id, format as {chrom}_{position}_{ref}/{alt} |
hg38_variant | The variant of this species maps to the GRCh38 coordinate of human synonymous mutations. |
reference_genome | Reference genome of this variant. |