Installation and Setup Guide - ChromatinCloud/SeqForge GitHub Wiki
This guide provides step-by-step instructions for installing BaseBuddy and preparing the necessary reference genome data.
- Overview
A correct installation is the foundation for using BaseBuddy. We offer two primary installation methods to suit different user needs:
Python with Conda (Recommended): This is the ideal method for most users on Linux or macOS. The Conda package manager handles the installation of Python, all required Python libraries, and all external command-line tools (e.g., SAMtools, ART), preventing conflicts with other software on your system.
Docker: This method provides a completely self-contained, pre-configured environment. It is the best option for ensuring perfect reproducibility and for avoiding installation issues on complex systems or Windows (via WSL2).
- Installation via Python (Conda)
Prerequisites:
A working Conda installation (Miniconda or Anaconda). We recommend using Mamba, a much faster drop-in replacement for Conda, if possible.
Git, for cloning the software repository.
Steps:
Clone the Repository:
Open a terminal and clone the BaseBuddy source code from GitHub.
Bash
git clone https://github.com/yourusername/BaseBuddy.git cd BaseBuddy
Create Conda Environment: The environment.yml file in the repository lists all dependencies. Create the environment from this file. Bash
Using Mamba (much faster)
mamba env create -f environment.yml
If you don't have Mamba, use Conda
conda env create -f environment.yml
This single command installs everything: samtools, art, bamsurgeon, fastqc, and all Python packages.
Activate the Environment: You must activate the environment each time you want to use BaseBuddy. Bash
mamba activate basebuddy
or
conda activate basebuddy
Install BaseBuddy: This command links your installation to the cloned source code. Bash
pip install -e .
Verify Installation: Check that the command-line tool is working. Bash
basebuddy version
# Expected output: BaseBuddy 0.1.0
- Installation via Docker
Prerequisites:
A working Docker installation.
Git.
Steps:
Clone the Repository:
Bash
git clone https://github.com/yourusername/BaseBuddy.git cd BaseBuddy
Build the Docker Image: From the repository root (where the Dockerfile is), run the build command. Bash
DOCKER_BUILDKIT=1 docker build -t basebuddy:latest .
Verify Installation: Run the version command inside a temporary container. Bash
docker run --rm basebuddy:latest version
# Expected output: BaseBuddy 0.1.0
Running Commands with Docker: To use the Dockerized BaseBuddy, you must mount your local data directory into the container using the -v flag. Bash
Example: Run short-read simulation
This mounts the current directory into the /data directory inside the container
docker run --rm -v "$(pwd):/data" basebuddy:latest
short /data/my_ref.fa --outdir /data/sim_output
Note for macOS/Windows Users: Ensure your project directory is included in Docker Desktop's list of approved directories for file sharing (in Preferences/Settings). 4. Preparing a Reference FASTA
Nearly every function in BaseBuddy requires a reference genome in FASTA format, which must be indexed.
Steps:
Download a Reference:
Obtain a standard reference genome from a public repository like NCBI, Ensembl, or UCSC.
Bash
Example for GRCh38 human genome
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GRCh38_latest_genomic.fna.gz gunzip GRCh38_latest_genomic.fna.gz
Index the FASTA: This step creates a .fai index file, allowing tools to access specific genomic locations quickly. This is not optional. Bash
samtools faidx GRCh38_latest_genomic.fna
(Optional) Create a Locus-Specific FASTA: For testing, it's much faster to work with a small genomic region. Extract a locus using samtools faidx. Bash
Extract a 200kb region around the FGFR2 gene
samtools faidx GRCh38_latest_genomic.fna chr10:122950000-123250000 > fgfr2_locus.fa
IMPORTANT: You must index the new, smaller FASTA file too!
samtools faidx fgfr2_locus.fa
Using fgfr2_locus.fa instead of the full genome will make your commands run dramatically faster.