The BaseBuddy Welcome Page - ChromatinCloud/SeqForge GitHub Wiki
Welcome to the BaseBuddy Wiki!
📄 Page 1: Landing Page (Home)
# Welcome to BaseBuddy! 🧬 Simulate with Confidence
BaseBuddy is a user-friendly command-line toolkit designed to simplify and standardize the simulation of Next-Generation Sequencing (NGS) data and the manipulation of existing sequencing files. Whether you're testing bioinformatics pipelines, generating datasets for machine learning, or creating teaching materials, BaseBuddy aims to make the process robust, reproducible, and easy to manage.
## Motivation
The world of bioinformatics often requires generating synthetic data to benchmark tools, understand error profiles, or explore an algorithm's behavior under specific conditions. Existing simulation tools are powerful but can sometimes have steep learning curves, inconsistent output formats, or cryptic error messages.
BaseBuddy was created to:
* Provide a **unified and intuitive interface** for common simulation tasks.
* Offer **robust error handling and pre-flight checks** to catch issues early.
* **Standardize outputs** into well-organized directories with manifests for better traceability.
* Facilitate **easy downstream integration**, for example, with visualization tools like IGV.
* Streamline common preparatory steps like **reference genome downloading and indexing**.
## Core Uses & Capabilities
With BaseBuddy, you can:
* **Simulate Short Reads**: Generate realistic Illumina NGS reads using **ART** with fine-grained control over depth, read length, fragment size, and error profiles.
* **Spike-in Variants**: Introduce known variants from a VCF file into existing BAM files (via a wrapper for tools like a conceptual `addsnv.py`).
* **Manage Reference Genomes**: Download publicly available reference genomes, verify their integrity via checksums, and automatically index them.
* **Organize Your Work**: All outputs are placed in a structured root directory, with each run getting its own subdirectory, a `manifest.json` detailing parameters and output files, and automatically generated IGV session files where applicable.
* **Inspect Outputs Easily**: Use the `basebuddy list-outputs` command to quickly find and understand the results of your simulation runs.
* **Improve Reproducibility**: By capturing run parameters and standardizing processes.
## Key Features ✨
* **User-Friendly CLI**: Clear commands and helpful messages.
* **Automated Indexing**: Automatic `samtools faidx` for FASTA files and `samtools index` for BAM files if indexes are missing.
* **Standardized Output Structure**: Configurable root output directory with dedicated, named subdirectories for each run.
* **Run Manifests**: Each run generates a `manifest.json` file detailing the command, parameters, and output file paths.
* **IGV Integration**: Automatic generation of IGV session XML files for easy visualization of BAMs and VCFs against the reference.
* **Robust Error Handling**: Pre-flight checks for tools and inputs, and clearer error messages from wrapped tools.
* **Checksum Verification**: Ensures integrity of downloaded reference files.
## Current Limitations
While BaseBuddy aims to be comprehensive, there are areas still under development or with specific considerations:
* **Panel/Region-Only Simulation**: Currently, to simulate reads from specific genomic regions (e.g., a gene panel), users need to provide a FASTA file that *already contains only* those regions of interest. Direct subsetting of a large FASTA using a BED file by BaseBuddy is a planned future enhancement.
* **External Tool Dependencies**: BaseBuddy orchestrates several powerful external bioinformatics tools (see Dependencies page). These must be installed and accessible in your system's PATH.
* **Long Read Simulation**: Support for long-read simulators (e.g., NanoSim, Badread) is conceptual and would require specific wrappers similar to the ART integration.
* **Composability/Chaining**: Direct chaining of multiple BaseBuddy commands into a single pipeline is a planned advanced feature.
## Getting Started
Ready to dive in?
1. Head over to the **[Installation, Troubleshooting & FAQ](/ChromatinCloud/SeqForge/wiki/Installation,-Troubleshooting-&-FAQ)** page to get BaseBuddy set up.
2. Explore the **[Capabilities](/ChromatinCloud/SeqForge/wiki/Capabilities)** page for detailed command usage.
3. Check the **[Dependencies](/ChromatinCloud/SeqForge/wiki/Dependencies)** page to ensure you have the necessary external tools.