Análise de enriquecimento de doenças (Disease enrichment analysis) - lmigueel/Bioinformatica GitHub Wiki
1. Sobre
O pacote DOSE (Yu et al. 2015) é usado para promover a investigação de doenças. DOSE fornece cinco métodos para medir semelhanças semânticas entre os termos de DO e produtos gênicos, modelo hipergeométrico e análise de enriquecimento de conjunto de genes (GSEA) para associar a doença à lista de genes e extrair o insight de associação de doenças a partir de perfis de expressão do genoma.
DOSE suporta análise de enriquecimento de Ontologia de Doenças (DO) (Schriml et al. 2011), Rede de Gene do Câncer (A. et al. 2016) e Rede de Gene de Doença (DisGeNET) (Janet et al. 2015). Além disso, vários métodos de visualização foram fornecidos pelo enriquecimento para ajudar a interpretar os resultados semânticos e de enriquecimento.
2. Instalação
A instalação deste pacote se dá pelo R. Segue:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DOSE")
3. Análise de enriquecimento para ontologia de doença
É importante ressaltar que o pacote DOSE suporta penas entrezgene ID. Você precisa converter os seus para este formato. DOSE é aplicado especialmente em dados de humano.
library(DOSE)
data(geneList)
# vou pegar genes com FC > 1.5. Mas aqui seria a sua lista de entrada,
# e o total de genes o universo (ler melhor abaixo)
gene <- names(geneList)[abs(geneList) > 1.5]
head(gene)
## [1] "4312" "8318" "10874" "55143" "55388" "991"
x <- enrichDO(gene = gene,
ont = "DO",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = names(geneList),
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.05,
readable = FALSE)
head(x)
## ID Description GeneRatio BgRatio
## DOID:170 DOID:170 endocrine gland cancer 48/331 472/6268
## DOID:10283 DOID:10283 prostate cancer 40/331 394/6268
## DOID:3459 DOID:3459 breast carcinoma 37/331 357/6268
## DOID:3856 DOID:3856 male reproductive organ cancer 40/331 404/6268
## DOID:824 DOID:824 periodontitis 16/331 109/6268
## DOID:3905 DOID:3905 lung carcinoma 43/331 465/6268
## pvalue p.adjust qvalue
## DOID:170 5.662129e-06 0.004784499 0.003826407
## DOID:10283 3.859157e-05 0.013921739 0.011133923
## DOID:3459 4.942629e-05 0.013921739 0.011133923
## DOID:3856 6.821467e-05 0.014410349 0.011524689
## DOID:824 1.699304e-04 0.018859464 0.015082872
## DOID:3905 1.749754e-04 0.018859464 0.015082872
## geneID
## DOID:170 10874/7153/1381/6241/11065/10232/332/6286/2146/10112/891/9232/4171/993/5347/4318/3576/1515/4821/8836/3159/7980/5888/333/898/9768/4288/3551/2152/9590/185/7043/3357/2952/5327/3667/1634/1287/4582/7122/3479/4680/6424/80310/652/8839/9547/1524
## DOID:10283 4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:3459 4312/6280/6279/7153/4751/890/4085/332/6286/6790/891/9232/10855/4171/5347/4318/701/2633/3576/9636/898/8792/4288/2952/4982/4128/4582/7031/3479/771/4250/2066/3169/10647/5304/5241/10551
## DOID:3856 4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:824 4312/6279/820/7850/4321/3595/4318/4069/3576/1493/6352/8842/185/2952/5327/4982
## DOID:3905 4312/6280/2305/9133/6279/7153/6278/6241/55165/11065/8140/10232/332/6286/3002/9212/4521/891/4171/9928/8061/4318/3576/1978/1894/7980/7083/898/6352/8842/4288/2152/2697/2952/3572/4582/7049/563/3479/1846/3117/2532/2922
## Count
## DOID:170 48
## DOID:10283 40
## DOID:3459 37
## DOID:3856 40
## DOID:824 16
## DOID:3905 43
Os resultados podem ser encontrados no DISEASE ONTOLOGY, que pode ser acessado AQUI. Basta procurar pelo ID, por exemplo, DOID:10283
.
O parâmetro ont pode ser “DO” ou “DOLite,” DOLite (Du et al. 2009), e foi construído para agregar os termos DO redundantes. Os dados DOLite não são atualizados, e recomendo o uso do usuário ont = "DO". O valor do pvalueCutoff definindo é o valor de corte do valor p. Já o valor p ajustado é dado pelo parâmetro pAdjustMethod definindo os métodos de correção do valor p, incluindo a correção de Bonferroni ("bonferroni"), Holm ("holm"), Hochberg ("hochberg"), Hommel ("hommel"), Benjamini & Hochberg ("BH") e Benjamini & Yekutieli (“BY”) enquanto qvalueCutoff é usado para controlar os valores q.
O parâmetro universe é o universo de genes de fundo para teste. Se o usuário não definir explicitamente este parâmetro, o enriquDO()
definirá o universo para todos os genes humanos que possuem a anotação DO.
O minGSSize (e maxGSSize) indica que apenas os termos DO que têm mais do que minGSSize (e menos do que maxGSSize) genes anotados serão testados.
4. Análise enriquecimento para a rede do gene do câncer
Network of Cancer Gene (NCG) (A. et al. 2016) é um repositório curado manualmente de genes de câncer. O NCG versão 5.0 (agosto de 2015) coleta 1.571 genes de câncer de 175 estudos publicados. DOSE apoia a análise da lista de genes e determina se eles são enriquecidos em genes conhecidos por serem mutados em um determinado tipo de câncer.
#FC de no máximo 3
gene2 <- names(geneList)[abs(geneList) < 3]
ncg <- enrichNCG(gene2)
head(ncg)
## ID
## pan-cancer_paediatric pan-cancer_paediatric
## triple_negative_breast_cancer triple_negative_breast_cancer
## breast_cancer breast_cancer
## soft_tissue_sarcoma soft_tissue_sarcoma
## paediatric_high-grade_glioma paediatric_high-grade_glioma
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
## Description
## pan-cancer_paediatric pan-cancer_paediatric
## triple_negative_breast_cancer triple_negative_breast_cancer
## breast_cancer breast_cancer
## soft_tissue_sarcoma soft_tissue_sarcoma
## paediatric_high-grade_glioma paediatric_high-grade_glioma
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
## GeneRatio BgRatio pvalue
## pan-cancer_paediatric 161/1782 182/2372 2.748816e-06
## triple_negative_breast_cancer 71/1782 75/2372 6.564667e-06
## breast_cancer 146/1782 171/2372 5.249102e-04
## soft_tissue_sarcoma 26/1782 26/2372 5.633144e-04
## paediatric_high-grade_glioma 25/1782 25/2372 7.524752e-04
## pancreatic_cancer_(all_histologies) 39/1782 41/2372 7.825494e-04
## p.adjust qvalue
## pan-cancer_paediatric 0.0002226541 0.0001504615
## triple_negative_breast_cancer 0.0002658690 0.0001796646
## breast_cancer 0.0105644165 0.0071390469
## soft_tissue_sarcoma 0.0105644165 0.0071390469
## paediatric_high-grade_glioma 0.0105644165 0.0071390469
## pancreatic_cancer_(all_histologies) 0.0105644165 0.0071390469
## geneID
## pan-cancer_paediatric 2146/55353/4609/1029/3575/22806/3418/3066/2120/30012/867/7468/7545/3195/865/64109/4613/613/11177/7490/238/10736/10054/5771/4893/140885/1785/9760/3417/6597/6476/9126/4869/10320/7307/80204/1050/8028/2312/6608/896/894/2196/4849/7023/5093/5079/5293/5727/55181/171017/51322/5781/3718/55294/60/673/8085/5897/4851/51176/1108/7764/10664/6098/2332/2201/6495/3845/7015/1441/2782/64919/4298/23512/8239/29102/6929/8021/6134/6598/4209/5290/22941/8726/207/3717/2033/10716/4928/6932/694/5156/10019/6886/9968/7080/2623/7874/1654/4149/3020/23219/55252/55729/10735/5728/4853/23451/51341/387/3206/6146/79718/2624/63035/3815/171023/23269/25/9839/23592/5896/7403/2260/54880/3716/9203/57178/6777/5789/4297/29072/90/546/120/25836/8289/4345/9611/5925/4763/1997/1499/7157/3399/5295/1387/4602/51564/1027/4005/2322/2078/678/6403/55709/1277/7494/64061/2625
## triple_negative_breast_cancer 6790/898/4609/1029/1789/4436/2120/867/7128/1788/1030/7490/2271/238/675/2047/4914/1316/5291/5293/5781/55294/8085/4851/4170/3845/355/1616/4854/5290/207/2033/4233/29110/2903/5979/5728/4853/2624/3815/10000/7403/2260/55193/472/5789/4297/2065/4286/8626/8405/8289/10499/55164/5925/4763/23405/1499/4921/7157/5295/1387/2078/324/7248/7048/22894/3480/2045/2066/2625
## breast_cancer 4751/701/898/639/29028/4609/7399/1029/1520/4436/83990/11200/10849/2072/4771/865/999/1788/26191/1030/10801/83737/6262/1956/672/8590/675/4893/6597/8202/2778/208/51412/896/2132/677/4849/4221/65220/2854/55294/673/4193/8085/4851/57127/841/3265/7764/10664/9721/3845/3956/868/9175/6602/11174/8239/9860/6954/5290/1523/207/2033/2334/3782/8312/9514/5156/186/54897/71/79728/545/143/2064/4089/8471/8314/91/5289/1021/10735/5979/5728/4853/23451/9439/6738/387/55770/79718/4301/171023/23013/51135/80243/4292/149076/10983/6103/7403/54880/4916/55193/9203/1635/1495/2309/472/5076/2909/5789/4297/2065/29072/2263/546/8289/2874/9611/5925/6416/4763/7157/4088/23152/5295/6794/1387/4602/1027/5737/324/595/7188/4681/4214/7494/2099/3480/4485/2891/6926/3169/2625
## soft_tissue_sarcoma 999/6850/4914/4342/2185/55294/2041/4851/2044/4058/5290/4486/5297/5728/3815/2324/7403/546/5925/4763/1499/7157/5159/2045/3667/2066
## paediatric_high-grade_glioma 4609/1029/1019/4613/1030/1956/4914/896/894/673/8493/5290/4233/5156/1021/63035/54880/4916/90/546/4763/7157/5295/595/4915
## pancreatic_cancer_(all_histologies) 1029/4771/8997/7159/2011/6597/7307/3710/6710/55294/7091/3845/23654/7046/3096/4089/91/8241/54549/92/23451/63035/7403/55193/23309/472/800/29072/23077/23499/8289/54894/6416/7157/4088/182/7048/2199/26960
## Count
## pan-cancer_paediatric 161
## triple_negative_breast_cancer 71
## breast_cancer 146
## soft_tissue_sarcoma 26
## paediatric_high-grade_glioma 25
## pancreatic_cancer_(all_histologies) 39
Bibliografia
Yu G, Wang L, Yan G, He Q (2015). “DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis.” Bioinformatics, 31(4), 608-609. doi: 10.1093/bioinformatics/btu684, http://bioinformatics.oxfordjournals.org/content/31/4/608.
A., Omer, Giovanni M. D., Thanos P. M., and Francesca D. C. 2016. “NCG 5.0: Updates of a Manually Curated Repository of Cancer Genes and Associated Properties from Cancer Mutational Screenings.” Nucleic Acids Research 44 (D1): D992–99. https://doi.org/10.1093/nar/gkv1123.
Du, Pan, Gang Feng, Jared Flatow, Jie Song, Michelle Holko, Warren A. Kibbe, and Simon M. Lin. 2009. “From Disease Ontology to Disease-Ontology Lite: Statistical Methods to Adapt a General-Purpose Ontology for the Test of Gene-Ontology Associations.” Bioinformatics 25 (12): i63–68. https://doi.org/10.1093/bioinformatics/btp193.
Manual pode ser acessado AQUI