Insights! - bcb420-2023/Helena_Jovic GitHub Wiki
All things interesting, relevant or useful.
Gene Set Enrichment Analysis Best Practices
Link to Paper
Takeaways
- GSEABenchmarkeR R/Bioconductor package ensures reproducible benchmarking of enrichment methods.
- Enrichment methods developed for microarray data can be applied to RNA-seq data with VST.
- Type of null hypothesis tested can impact gene set testing and identification of enriched gene sets.
- Self-contained methods identify gene sets as enriched containing a single differentially expressed gene; ROAST and GSVA are recommended.
- Competitive methods test for excess differential expression in the gene set compared to the background level and tend to rank relevant gene sets systematically higher; ORA and PADOG are recommended.
Outdated Gene Annotation Nature
Link to Paper
Takeaways
- The success of pathway analysis depends on the quality of gene annotations.
- The use of outdated resources strongly affects practical genomic analysis and recent literature.
- Many software tools interpret gene lists using functional information that has not been updated for years.
- In a survey of 25 web-based pathway enrichment tools and citations of these tools in over 3,800 publications, most tools were found to be outdated by several years.
Reproducibility Nature Methods
Link to Paper
Takeaways
- High-throughput technologies have generated massive amounts of biological data, making reproducibility of analysis workflows a key issue in computational biology.
- Workflow managers (e.g. Snakemake, Nextflow, Cromwell) automate data processing and analysis, enabling transparency, code sharing, and long-term reproducibility through data provenance.
- Workflow managers provide detailed information on input parameters, execution environment, software version, resource usage, and pipeline steps, and the workflow itself can be archived and made citable.
- Portability is achieved through package managers (e.g. Conda) and containerization software (e.g. Docker), allowing for platform-independent software installation and distribution.
- Scalability is achieved through efficient resource management and the ability to handle any size and quantity of input data.
- Workflow parallelization and adaptive scheduling are commonly used to manage resources and parallelize workflow steps.