Troubleshooting & FAQ - ChromatinCloud/SeqForge GitHub Wiki

This article provides solutions to common errors, problems, and frequently asked questions when using BaseBuddy.

Dependency & Installation Issues

Q: I get RuntimeError: art_illumina not found in PATH (or similar for samtools, addsnv.py, etc.). A: This is the most common error. It means a required command-line tool is not installed or not in your system's PATH.

Solution 1: Activate the Environment. The tools are installed inside the Conda environment. You must activate it first: mamba activate basebuddy or conda activate basebuddy.
Solution 2: Use Conda. If you did not use Conda, you must install all external tools manually and add them to your PATH. Using the provided environment.yml is strongly recommended to prevent this.

Q: I see a Python ImportError for a package like SigProfilerSimulatorFunc. A: This suggests a Python package is missing or has the wrong version. The most reliable solution is to create a clean Conda environment from the environment.yml file, which specifies the exact, tested versions of all dependencies.

Q: How can I run BaseBuddy on Windows? A: The toolkit is designed for a Linux environment.

Recommended Method: Docker. The Docker container runs on any OS and is the simplest, most reliable way to use BaseBuddy on Windows.
Advanced Method: WSL2. The Windows Subsystem for Linux provides a full Linux environment on Windows where you can follow the standard Conda installation.

Input File & Reference Genome Issues

Q: A command fails, complaining about a missing .fai file or "no SQ lines". A: Your reference FASTA file has not been indexed. This is a mandatory step for almost all bioinformatics tools.

Solution: Run samtools faidx /path/to/your/reference.fa. This creates the required reference.fa.fai index file in the same directory.

Q: The spike command fails with a "contig mismatch" error. A: The chromosome names in your input files are inconsistent. For example, your BAM may use chr1 while your VCF uses 1. All input files (FASTA, BAM, VCF) must use the exact same chromosome naming scheme. 3. Docker-Specific Issues

Q: On my Mac, I get a docker: Error... Mounts denied error. A: This is a Docker Desktop security feature on macOS.

Solution: Open Docker Desktop, go to Preferences > Resources > File Sharing, and add your project's parent directory (e.g., /Users/your_name/work) to the list of allowed locations.

Q: How do I use my local files with the Docker container? A: You must map a local directory to a directory inside the container with the -v flag. The standard convention is docker run --rm -v "$(pwd):/data". This makes your current directory available at the /data path inside the container, so you refer to your files as /data/my_file.fa. 4. General FAQ

Q: My simulation is using gigabytes of disk space and taking hours. Is that normal? A: Yes. Simulating deep coverage of a large genome is computationally intensive. A 30x coverage simulation of the human genome will generate ~90 GB of data.

Best Practice: Always test your commands on a small subset of the genome first (e.g., a single chromosome or gene extracted to its own FASTA file). This provides results in seconds or minutes and allows you to confirm your parameters are correct before launching a large-scale run.

Q: What's the difference between basebuddy signature and the GUI's "Apply Signature to FASTA"? A: They have different outputs for different use cases.

basebuddy signature: Generates a VCF file listing the simulated mutations. This is used to benchmark variant callers and signature analysis software.
Apply Signature to FASTA: Directly edits a FASTA file to contain the mutations. This is used to create a new, mutated reference genome, from which you can then simulate reads using the short or long commands.