BaseBuddy Dependencies - ChromatinCloud/SeqForge GitHub Wiki

BaseBuddy orchestrates several external command-line tools and relies on a Python environment. Ensure these dependencies are met for BaseBuddy to function correctly.

1. Core External Tools

These tools must be installed on your system and accessible via your system's PATH environment variable. BaseBuddy will perform checks for these tools and report an error if they are not found.

  • ART (Alignment/Read Tool)

    • Used by: basebuddy short command for simulating Illumina short reads. You'll typically need art_illumina.
    • Purpose: Generates synthetic NGS reads from a reference FASTA sequence, emulating different sequencing platforms and error models.
    • Installation: Download binaries or compile from source from the official ART website:
    • Key ART Dependencies (ART itself may require these):
      • GNU Scientific Library (GSL) - often required if compiling ART from source.
  • Samtools

    • Used by: Various BaseBuddy operations, including:
      • FASTA indexing (samtools faidx) - automatically run if index is missing.
      • BAM indexing (samtools index) - automatically run if index is missing for input BAMs or for BAMs generated by internal steps (like sorting after addsnv.py).
      • BAM sorting (samtools sort) - used internally after variant spiking if the spiker tool produces an unsorted BAM.
    • Purpose: A suite of utilities for interacting with and processing high-throughput sequencing data formats like SAM, BAM, and CRAM, and for reference FASTA manipulation.
    • Installation:
      • Official Website (compile from source): HTSlib and Samtools
      • Conda/Mamba: mamba install -c bioconda samtools or conda install -c bioconda samtools
      • Package managers: apt-get install samtools (Debian/Ubuntu), brew install samtools (macOS).
    • Key Samtools Dependencies:
      • HTSlib (usually bundled or installed alongside Samtools).
  • addsnv.py (or similar variant spiking tool - Conceptual)

    • Used by: basebuddy spike command.
    • Purpose: This is a placeholder for a user-provided or specific third-party script/tool capable of introducing SNVs (and potentially indels) from a VCF file into reads within a BAM file.
    • Installation: The user is responsible for ensuring this script (e.g., named addsnv.py or configured if BaseBuddy allows specifying the tool path) is:
      1. Available on their system.
      2. Executable.
      3. Present in their system's PATH or its path explicitly provided if BaseBuddy supports it.
    • Note: If you are using a specific, known tool for this, replace this section with details for that tool.
  • curl

    • Used by: basebuddy download-ref command.
    • Purpose: A command-line tool for transferring data with URLs, used here for downloading files.
    • Installation:
      • Usually pre-installed on most Linux distributions and macOS.
      • Verify with which curl. If missing, install via your system's package manager (e.g., apt-get install curl, yum install curl).

2. Python Environment

  • Python Version: Python 3.8 or newer is recommended.
  • Core Python Libraries Used by BaseBuddy:
    • argparse: For command-line argument parsing.
    • pathlib: For object-oriented filesystem paths.
    • logging: For application logging.
    • json: For reading/writing manifest files.
    • subprocess: For running external tools.
    • hashlib: For checksum verification.
    • shutil: For utilities like finding tool paths (shutil.which).
    • datetime: For timestamps.
    • xml.etree.ElementTree: For generating IGV session XML files.
    • copy: For deepcopying objects (like args for manifest).
    • These are generally part of the Python Standard Library or are common.
  • Environment Management (Recommended):
    • Use Conda or Mamba to create an isolated environment for BaseBuddy. If an environment.yml file is provided with the BaseBuddy source code, use it:
      mamba env create -f environment.yml
      conda activate <env_name_in_yml>
      
    • Alternatively, if using pip with a pyproject.toml or requirements.txt:
      python -m venv .venv
      source .venv/bin/activate
      pip install -r requirements.txt # or pip install .
      

3. System Requirements

  • Operating System: Primarily Linux and macOS. Windows Subsystem for Linux (WSL) might work but is generally less tested for many bioinformatics tools.
  • Disk Space:
    • Reference genomes can be large (e.g., Human ~3GB compressed, much larger uncompressed).
    • Simulated FASTQ/BAM files can also consume significant disk space, especially at high depths or for large genomes. Ensure you have adequate free space.
  • Memory (RAM):
    • Indexing large genomes with samtools faidx is generally not memory intensive.
    • ART simulation memory usage depends on genome size and parameters.
    • Aligning reads (if that were a step in a pipeline) or sorting large BAM files with samtools sort can be memory-intensive.
    • Running BaseBuddy itself is lightweight, but the tools it calls might have higher requirements.