Análise de enriquecimento de doenças (Disease enrichment analysis) - lmigueel/Bioinformatica GitHub Wiki

1. Sobre

O pacote DOSE (Yu et al. 2015) é usado para promover a investigação de doenças. DOSE fornece cinco métodos para medir semelhanças semânticas entre os termos de DO e produtos gênicos, modelo hipergeométrico e análise de enriquecimento de conjunto de genes (GSEA) para associar a doença à lista de genes e extrair o insight de associação de doenças a partir de perfis de expressão do genoma.

DOSE suporta análise de enriquecimento de Ontologia de Doenças (DO) (Schriml et al. 2011), Rede de Gene do Câncer (A. et al. 2016) e Rede de Gene de Doença (DisGeNET) (Janet et al. 2015). Além disso, vários métodos de visualização foram fornecidos pelo enriquecimento para ajudar a interpretar os resultados semânticos e de enriquecimento.

2. Instalação

A instalação deste pacote se dá pelo R. Segue:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DOSE")

3. Análise de enriquecimento para ontologia de doença

É importante ressaltar que o pacote DOSE suporta penas entrezgene ID. Você precisa converter os seus para este formato. DOSE é aplicado especialmente em dados de humano.

library(DOSE)
data(geneList)

# vou pegar genes com FC > 1.5. Mas aqui seria a sua lista de entrada, 
# e o total de genes o universo (ler melhor abaixo)
gene <- names(geneList)[abs(geneList) > 1.5]
head(gene)

## [1] "4312"  "8318"  "10874" "55143" "55388" "991"

x <- enrichDO(gene          = gene,
              ont           = "DO",
              pvalueCutoff  = 0.05,
              pAdjustMethod = "BH",
              universe      = names(geneList),
              minGSSize     = 5,
              maxGSSize     = 500,
              qvalueCutoff  = 0.05,
              readable      = FALSE)
head(x)

##                    ID                    Description GeneRatio  BgRatio
## DOID:170     DOID:170         endocrine gland cancer    48/331 472/6268
## DOID:10283 DOID:10283                prostate cancer    40/331 394/6268
## DOID:3459   DOID:3459               breast carcinoma    37/331 357/6268
## DOID:3856   DOID:3856 male reproductive organ cancer    40/331 404/6268
## DOID:824     DOID:824                  periodontitis    16/331 109/6268
## DOID:3905   DOID:3905                 lung carcinoma    43/331 465/6268
##                  pvalue    p.adjust      qvalue
## DOID:170   5.662129e-06 0.004784499 0.003826407
## DOID:10283 3.859157e-05 0.013921739 0.011133923
## DOID:3459  4.942629e-05 0.013921739 0.011133923
## DOID:3856  6.821467e-05 0.014410349 0.011524689
## DOID:824   1.699304e-04 0.018859464 0.015082872
## DOID:3905  1.749754e-04 0.018859464 0.015082872
##                                                                                                                                                                                                                                                   geneID
## DOID:170   10874/7153/1381/6241/11065/10232/332/6286/2146/10112/891/9232/4171/993/5347/4318/3576/1515/4821/8836/3159/7980/5888/333/898/9768/4288/3551/2152/9590/185/7043/3357/2952/5327/3667/1634/1287/4582/7122/3479/4680/6424/80310/652/8839/9547/1524
## DOID:10283                                          4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:3459                                                          4312/6280/6279/7153/4751/890/4085/332/6286/6790/891/9232/10855/4171/5347/4318/701/2633/3576/9636/898/8792/4288/2952/4982/4128/4582/7031/3479/771/4250/2066/3169/10647/5304/5241/10551
## DOID:3856                                           4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:824                                                                                                                                                                   4312/6279/820/7850/4321/3595/4318/4069/3576/1493/6352/8842/185/2952/5327/4982
## DOID:3905                          4312/6280/2305/9133/6279/7153/6278/6241/55165/11065/8140/10232/332/6286/3002/9212/4521/891/4171/9928/8061/4318/3576/1978/1894/7980/7083/898/6352/8842/4288/2152/2697/2952/3572/4582/7049/563/3479/1846/3117/2532/2922
##            Count
## DOID:170      48
## DOID:10283    40
## DOID:3459     37
## DOID:3856     40
## DOID:824      16
## DOID:3905     43

Os resultados podem ser encontrados no DISEASE ONTOLOGY, que pode ser acessado AQUI. Basta procurar pelo ID, por exemplo, DOID:10283.

O parâmetro ont pode ser “DO” ou “DOLite,” DOLite (Du et al. 2009), e foi construído para agregar os termos DO redundantes. Os dados DOLite não são atualizados, e recomendo o uso do usuário ont = "DO". O valor do pvalueCutoff definindo é o valor de corte do valor p. Já o valor p ajustado é dado pelo parâmetro pAdjustMethod definindo os métodos de correção do valor p, incluindo a correção de Bonferroni ("bonferroni"), Holm ("holm"), Hochberg ("hochberg"), Hommel ("hommel"), Benjamini & Hochberg ("BH") e Benjamini & Yekutieli (“BY”) enquanto qvalueCutoff é usado para controlar os valores q.

O parâmetro universe é o universo de genes de fundo para teste. Se o usuário não definir explicitamente este parâmetro, o enriquDO() definirá o universo para todos os genes humanos que possuem a anotação DO.

O minGSSize (e maxGSSize) indica que apenas os termos DO que têm mais do que minGSSize (e menos do que maxGSSize) genes anotados serão testados.

4. Análise enriquecimento para a rede do gene do câncer

Network of Cancer Gene (NCG) (A. et al. 2016) é um repositório curado manualmente de genes de câncer. O NCG versão 5.0 (agosto de 2015) coleta 1.571 genes de câncer de 175 estudos publicados. DOSE apoia a análise da lista de genes e determina se eles são enriquecidos em genes conhecidos por serem mutados em um determinado tipo de câncer.

#FC de no máximo 3
gene2 <- names(geneList)[abs(geneList) < 3]

ncg <- enrichNCG(gene2) 

head(ncg)

##                                                                      ID
## pan-cancer_paediatric                             pan-cancer_paediatric
## triple_negative_breast_cancer             triple_negative_breast_cancer
## breast_cancer                                             breast_cancer
## soft_tissue_sarcoma                                 soft_tissue_sarcoma
## paediatric_high-grade_glioma               paediatric_high-grade_glioma
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
##                                                             Description
## pan-cancer_paediatric                             pan-cancer_paediatric
## triple_negative_breast_cancer             triple_negative_breast_cancer
## breast_cancer                                             breast_cancer
## soft_tissue_sarcoma                                 soft_tissue_sarcoma
## paediatric_high-grade_glioma               paediatric_high-grade_glioma
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
##                                     GeneRatio  BgRatio       pvalue
## pan-cancer_paediatric                161/1782 182/2372 2.748816e-06
## triple_negative_breast_cancer         71/1782  75/2372 6.564667e-06
## breast_cancer                        146/1782 171/2372 5.249102e-04
## soft_tissue_sarcoma                   26/1782  26/2372 5.633144e-04
## paediatric_high-grade_glioma          25/1782  25/2372 7.524752e-04
## pancreatic_cancer_(all_histologies)   39/1782  41/2372 7.825494e-04
##                                         p.adjust       qvalue
## pan-cancer_paediatric               0.0002226541 0.0001504615
## triple_negative_breast_cancer       0.0002658690 0.0001796646
## breast_cancer                       0.0105644165 0.0071390469
## soft_tissue_sarcoma                 0.0105644165 0.0071390469
## paediatric_high-grade_glioma        0.0105644165 0.0071390469
## pancreatic_cancer_(all_histologies) 0.0105644165 0.0071390469
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           geneID
## pan-cancer_paediatric               2146/55353/4609/1029/3575/22806/3418/3066/2120/30012/867/7468/7545/3195/865/64109/4613/613/11177/7490/238/10736/10054/5771/4893/140885/1785/9760/3417/6597/6476/9126/4869/10320/7307/80204/1050/8028/2312/6608/896/894/2196/4849/7023/5093/5079/5293/5727/55181/171017/51322/5781/3718/55294/60/673/8085/5897/4851/51176/1108/7764/10664/6098/2332/2201/6495/3845/7015/1441/2782/64919/4298/23512/8239/29102/6929/8021/6134/6598/4209/5290/22941/8726/207/3717/2033/10716/4928/6932/694/5156/10019/6886/9968/7080/2623/7874/1654/4149/3020/23219/55252/55729/10735/5728/4853/23451/51341/387/3206/6146/79718/2624/63035/3815/171023/23269/25/9839/23592/5896/7403/2260/54880/3716/9203/57178/6777/5789/4297/29072/90/546/120/25836/8289/4345/9611/5925/4763/1997/1499/7157/3399/5295/1387/4602/51564/1027/4005/2322/2078/678/6403/55709/1277/7494/64061/2625
## triple_negative_breast_cancer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 6790/898/4609/1029/1789/4436/2120/867/7128/1788/1030/7490/2271/238/675/2047/4914/1316/5291/5293/5781/55294/8085/4851/4170/3845/355/1616/4854/5290/207/2033/4233/29110/2903/5979/5728/4853/2624/3815/10000/7403/2260/55193/472/5789/4297/2065/4286/8626/8405/8289/10499/55164/5925/4763/23405/1499/4921/7157/5295/1387/2078/324/7248/7048/22894/3480/2045/2066/2625
## breast_cancer                                                                                                                     4751/701/898/639/29028/4609/7399/1029/1520/4436/83990/11200/10849/2072/4771/865/999/1788/26191/1030/10801/83737/6262/1956/672/8590/675/4893/6597/8202/2778/208/51412/896/2132/677/4849/4221/65220/2854/55294/673/4193/8085/4851/57127/841/3265/7764/10664/9721/3845/3956/868/9175/6602/11174/8239/9860/6954/5290/1523/207/2033/2334/3782/8312/9514/5156/186/54897/71/79728/545/143/2064/4089/8471/8314/91/5289/1021/10735/5979/5728/4853/23451/9439/6738/387/55770/79718/4301/171023/23013/51135/80243/4292/149076/10983/6103/7403/54880/4916/55193/9203/1635/1495/2309/472/5076/2909/5789/4297/2065/29072/2263/546/8289/2874/9611/5925/6416/4763/7157/4088/23152/5295/6794/1387/4602/1027/5737/324/595/7188/4681/4214/7494/2099/3480/4485/2891/6926/3169/2625
## soft_tissue_sarcoma                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             999/6850/4914/4342/2185/55294/2041/4851/2044/4058/5290/4486/5297/5728/3815/2324/7403/546/5925/4763/1499/7157/5159/2045/3667/2066
## paediatric_high-grade_glioma                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             4609/1029/1019/4613/1030/1956/4914/896/894/673/8493/5290/4233/5156/1021/63035/54880/4916/90/546/4763/7157/5295/595/4915
## pancreatic_cancer_(all_histologies)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      1029/4771/8997/7159/2011/6597/7307/3710/6710/55294/7091/3845/23654/7046/3096/4089/91/8241/54549/92/23451/63035/7403/55193/23309/472/800/29072/23077/23499/8289/54894/6416/7157/4088/182/7048/2199/26960
##                                     Count
## pan-cancer_paediatric                 161
## triple_negative_breast_cancer          71
## breast_cancer                         146
## soft_tissue_sarcoma                    26
## paediatric_high-grade_glioma           25
## pancreatic_cancer_(all_histologies)    39

Bibliografia

Yu G, Wang L, Yan G, He Q (2015). “DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis.” Bioinformatics, 31(4), 608-609. doi: 10.1093/bioinformatics/btu684, http://bioinformatics.oxfordjournals.org/content/31/4/608.

A., Omer, Giovanni M. D., Thanos P. M., and Francesca D. C. 2016. “NCG 5.0: Updates of a Manually Curated Repository of Cancer Genes and Associated Properties from Cancer Mutational Screenings.” Nucleic Acids Research 44 (D1): D992–99. https://doi.org/10.1093/nar/gkv1123.

Du, Pan, Gang Feng, Jared Flatow, Jie Song, Michelle Holko, Warren A. Kibbe, and Simon M. Lin. 2009. “From Disease Ontology to Disease-Ontology Lite: Statistical Methods to Adapt a General-Purpose Ontology for the Test of Gene-Ontology Associations.” Bioinformatics 25 (12): i63–68. https://doi.org/10.1093/bioinformatics/btp193.

Manual pode ser acessado AQUI