BaseBuddy Dependencies - ChromatinCloud/SeqForge GitHub Wiki
BaseBuddy orchestrates several external command-line tools and relies on a Python environment. Ensure these dependencies are met for BaseBuddy to function correctly.
1. Core External Tools
These tools must be installed on your system and accessible via your system's PATH
environment variable. BaseBuddy will perform checks for these tools and report an error if they are not found.
-
ART (Alignment/Read Tool)
- Used by:
basebuddy short
command for simulating Illumina short reads. You'll typically needart_illumina
. - Purpose: Generates synthetic NGS reads from a reference FASTA sequence, emulating different sequencing platforms and error models.
- Installation: Download binaries or compile from source from the official ART website:
- Key ART Dependencies (ART itself may require these):
- GNU Scientific Library (GSL) - often required if compiling ART from source.
- Used by:
-
Samtools
- Used by: Various BaseBuddy operations, including:
- FASTA indexing (
samtools faidx
) - automatically run if index is missing. - BAM indexing (
samtools index
) - automatically run if index is missing for input BAMs or for BAMs generated by internal steps (like sorting afteraddsnv.py
). - BAM sorting (
samtools sort
) - used internally after variant spiking if the spiker tool produces an unsorted BAM.
- FASTA indexing (
- Purpose: A suite of utilities for interacting with and processing high-throughput sequencing data formats like SAM, BAM, and CRAM, and for reference FASTA manipulation.
- Installation:
- Official Website (compile from source): HTSlib and Samtools
- Conda/Mamba:
mamba install -c bioconda samtools
orconda install -c bioconda samtools
- Package managers:
apt-get install samtools
(Debian/Ubuntu),brew install samtools
(macOS).
- Key Samtools Dependencies:
- HTSlib (usually bundled or installed alongside Samtools).
- Used by: Various BaseBuddy operations, including:
-
addsnv.py
(or similar variant spiking tool - Conceptual)- Used by:
basebuddy spike
command. - Purpose: This is a placeholder for a user-provided or specific third-party script/tool capable of introducing SNVs (and potentially indels) from a VCF file into reads within a BAM file.
- Installation: The user is responsible for ensuring this script (e.g., named
addsnv.py
or configured if BaseBuddy allows specifying the tool path) is:- Available on their system.
- Executable.
- Present in their system's
PATH
or its path explicitly provided if BaseBuddy supports it.
- Note: If you are using a specific, known tool for this, replace this section with details for that tool.
- Used by:
-
curl
- Used by:
basebuddy download-ref
command. - Purpose: A command-line tool for transferring data with URLs, used here for downloading files.
- Installation:
- Usually pre-installed on most Linux distributions and macOS.
- Verify with
which curl
. If missing, install via your system's package manager (e.g.,apt-get install curl
,yum install curl
).
- Used by:
2. Python Environment
- Python Version: Python 3.8 or newer is recommended.
- Core Python Libraries Used by BaseBuddy:
argparse
: For command-line argument parsing.pathlib
: For object-oriented filesystem paths.logging
: For application logging.json
: For reading/writing manifest files.subprocess
: For running external tools.hashlib
: For checksum verification.shutil
: For utilities like finding tool paths (shutil.which
).datetime
: For timestamps.xml.etree.ElementTree
: For generating IGV session XML files.copy
: For deepcopying objects (likeargs
for manifest).- These are generally part of the Python Standard Library or are common.
- Environment Management (Recommended):
- Use Conda or Mamba to create an isolated environment for BaseBuddy. If an
environment.yml
file is provided with the BaseBuddy source code, use it:mamba env create -f environment.yml conda activate <env_name_in_yml>
- Alternatively, if using
pip
with apyproject.toml
orrequirements.txt
:python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # or pip install .
- Use Conda or Mamba to create an isolated environment for BaseBuddy. If an
3. System Requirements
- Operating System: Primarily Linux and macOS. Windows Subsystem for Linux (WSL) might work but is generally less tested for many bioinformatics tools.
- Disk Space:
- Reference genomes can be large (e.g., Human ~3GB compressed, much larger uncompressed).
- Simulated FASTQ/BAM files can also consume significant disk space, especially at high depths or for large genomes. Ensure you have adequate free space.
- Memory (RAM):
- Indexing large genomes with
samtools faidx
is generally not memory intensive. - ART simulation memory usage depends on genome size and parameters.
- Aligning reads (if that were a step in a pipeline) or sorting large BAM files with
samtools sort
can be memory-intensive. - Running BaseBuddy itself is lightweight, but the tools it calls might have higher requirements.
- Indexing large genomes with