Omics Data Analysis Workshop ‐ MUHAS Tanzania - omicsEye/Workshop GitHub Wiki
Welcome to the wiki of the Omics Data Analysis ‐ MUHAS Tanzania MUHAS Tanzania 2025 Workshop!
Advancing Omics Data Science: Methods and Applications
organized by the George Washington University and Muhimbili University of Health and Allied Sciences (MUHAS).
Abstract
Methodological advancements paired with measured multi-omics data using high-throughput technologies enable capturing a comprehensive snapshot of distinct biological entities. In particular, low-cost, culture-independent omics profiling has made omics surveys of human health, other hosts, and the environment feasible at an unprecedented scale. The resulting data have stimulated the development of new statistical and computational approaches to analyze and integrate omics data, including human gene expression, microbial gene products, metabolites, and proteins, among others.
Metabolomics data generated from diverse platforms are often analyzed individually; we aim to combine metabolite profiles and feed them into generic downstream analysis software with proper appreciation of the data's statistical properties, resulting in more powerful results and biological inferences. Further, there is also an overwhelmingly extensive collection of downstream analysis software platforms, and appropriately selecting the best tool can be difficult for untrained researchers and non-specialists.
Also, we present a high-level introduction to computational multi-omics, highlighting the state-of-the-art in the field and outstanding challenges geared toward downstream analysis methods. The workshop will include formulating biological hypotheses and identifying the statistical methods currently available to achieve them. The workshop is project-focused and uses a hands-on approach. Participants are encouraged to attend with a specific study or project in mind for the application of the workshop content in the short term. The workshop will use real data for the exercises.
Rationale for Workshop
A joint effort will run this workshop between George Washington University and Merck Research Laboratories, with open and FAIR resources available on GitHub. Researchers from industry and academia will come together to share a diverse perspective on the topic, both from drug discovery and basic science angles, enabling attendees to achieve a holistic view of multi-omics and clinical data integration through state-of-the-art tools applied to motivating examples and use cases. We will begin with an overview of the statistical challenges inherent in analyzing the high-dimensional data that is typical of multi-omics studies. Introductory lectures will include: 1) The challenges associated with precisely testing for multivariable association in population-scale meta-omics studies, 2) challenges and advances in pathway enrichment analyses, including techniques and characterization of omics features, 3) meta-analysis of metabolomics datasets for high-sensitivity discovery and integration with other types of data such as metagenomics data.
Learning Objectives
- Pattern Discovery in Multi-Omics Data
Attendees will explore tools for multi-omics analysis, including:- omeClust: Omics community detection using multi-resolution clustering. Interspersed with lecture content, attendees will work through multi-omics analysis tutorials.
- Tweedieverse tutorial and Tweedieverse examples: A unified statistical framework for differential analysis of multi-omics.
- omePath: Omics pathway enrichment analysis.
- deepBreaks: Genotype-phenotype association testing.
- waveome: Longitudinal omics data analysis.
- Metabolomics Meta-Analysis
- Workshop attendees will use tools for metabolomics meta-analysis through multi-study data scaling, integration, and harmonization using massSight.
- Visualization and Interpretation
- Attendees will practice generating publication-quality figures and effectively visualizing results.
Learning outcomes for participants
Participants will:
- Be able to apply novel techniques (such as massSight) to combine metabolite profiles and perform meta-analysis of metabolomics data.
- Understand statistical properties of metabolomics data and challenges for multivariable association testing in population-scale meta-omics studies.
- Understand how to apply pathway enrichment analysis to metabolomic data using a variety of statistical methods implemented in omePath, and
- Be able to perform a meta-analysis of metabolomics datasets by combining multiple studies data and perform pairwise association testing with other omics profiles in population-scale datasets.
Prepration
Preparation tasks are optional. However, they help the organizers to focus on scientific discussion rather than troubleshooting technical issues.
- Install the latest R and Rstudio on your local computer
- Install the listed software in the learning objectives
- Try to run demos of each software
- Bring your data to apply these techniques
Tips
- For Windows OS please use Command Prompt with admin access
Agenda
Tuesday
9 AM - 4 PM – Welcome and Introduction to Multi-Omics
- Multivariable Association Testing: Challenges and Techniques using Tweediverse and Maaslin3
- Variable Selection and Omics Community Detection using omeClust
Wednesday
9 AM - 4 PM – Pathway Enrichment Analysis and Meta-Analysis
- omePath for Pathway Enrichment Analysis
- massSight for Integrating Metabolomics Data
- Maaslin3 for Meta-Analysis
Thursday
9 AM - 2 PM – Genotype-Phenotype Association and Longitudinal Analysis
- deepBreaks for Genotype-Phenotype Association
- waveome for Longitudinal Data Analysis
- Tips for Visualization of Results
- Q&A and Wrap-up
Friday
10 AM - 12 PM – Office Hours
- Discuss participants' projects and suitable analyses
- Troubleshooting R package installations and other software issues
Materials
The workshop materials are available here.
Organizers
Ali Rahnavard: George Washington University (Organizer, Instructor)
Emily Smith: George Washington University (Organizer, Instructor)
Sabina Mugusi: Muhimbili University of Health and Allied Sciences (MUHAS) (Organizer)
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant Number (2109688), Bill & Melinda Gates Foundation under Investment number (016930).