3.3 Post Analysis and Dark Matter - bcb420-2022/Evgeniya_Gorobets GitHub Wiki
Objective
Learn how to conduct post/signature set analysis and how to detect dark matter.
Duration
Estimated Duration: 1hr15min
Actual Time Spent: 1hr
Procedure
- Listen to Week 12 lectures and take notes.
Notes
Post-Analysis
- adding new gene sets ("signature sets") to the enrichment map (that weren't present during enrichment analysis)
- more specialized gene sets, from DrugBank (drug targets), miRBase (regulators), disease signatures (OMIM), etc.
- Add these gene sets to your network graph and show that they're from post-analysis by using different shapes/colours
- The enrichment map must already be created before your add signature sets; use "Add Signature Gene Sets" button in EnrichmentMap
- 2 types of post/signature analyses:
- Exploratory -- search entire DB (ex. Human Phenotype DB, miRBase) and compute all edges and see which has most significant overlap (slow to compute)
- Known signature -- search subset of DB that contains the genes of interest or signature sets of interest and find significant edges (fast to compute)
- must provide GMT file with signature sets
- Post analysis statistics
- Mann Whitney (Wilcoxon rank sum test; unlike the other tests, doesn't depend on rank)
- two-sided --> DEFAULT
- one-sided greater
- one-sided lesser
- Hypergeometric
- Overlap
- number of genes
- percentage of signature set genes
- percentage of EM gene set genes
- Mann Whitney (Wilcoxon rank sum test; unlike the other tests, doesn't depend on rank)
- Click on post analysis node to see gene overlap
Dark Matter
- Dark matter = DEGs that don't show up in enriched pathways, either because they have no annotations or because they belong to gene sets that are not significantly enriched
- Dark matter analysis
- Load GMT file containing all gene sets used in analysis using
GSA.read.gmt()and get all genes in all gene sets. - Load expression file and rank file and get all genes in expression set.
- Load GSEA results and get all genes in enrichment results (gsea_report_positive.xls, gsea_report_negative.xls)
- Use
setdiff()to find which genes are in the expression file but not in any gene sets or not in enrichment results. This is your dark matter. - Get ranks of dark matter; investigate further with heat maps; show how many were highly ranked, how many were significant, etc.
- Load GMT file containing all gene sets used in analysis using
Conclusion
EnrichmentMap offers a very easy way to add signature sets to your network map to see how special gene sets relate to your significantly enriched gene sets. Dark matter is not very difficult to identify but is extremely import to investigate and analyze.
Outlook
The journal entry for the next unit is 3.4 GSEA on Ovarian Cancer Subtypes