troubleshooting - skoriche/NGIAB-Calibration-DevCon25 GitHub Wiki

Troubleshooting Guide

This page contains solutions to common issues you might encounter during the workshop.

Installation Issues

Docker Permission Issues (Linux)

If you get permission denied errors when running Docker:

sudo usermod -aG docker $USER
# Then log out and back in, or run:
newgrp docker

UV Command Not Found

If uvx command is not recognized:

#To add $HOME/.local/bin to your PATH, either restart your shell or run:
    source $HOME/.local/bin/env (sh, bash, zsh)
    source $HOME/.local/bin/env.fish (fish)

Docker Not Starting

Windows/Mac:

  • Ensure Docker Desktop is running (check system tray)
  • Restart Docker Desktop from the menu

Linux:

sudo systemctl start docker
sudo systemctl enable docker

Calibration Issues

Calibration Fails to Start

  1. Check Docker is running:

    docker ps
    

    If this fails, Docker isn't running properly.

  2. Verify data structure:

    ls -la provo-10154200/
    # Should show: calibration/ config/ forcings/ outputs/
    
  3. Check error logs:

    tail -n 50 provo-10154200/calibration/Output/Calibration_Run/ngen_*/ngen.log
    

High Computational Time

  • Start with fewer iterations for testing (-i 4 or -i 10)
  • Check Docker resource allocation:
    • Docker Desktop: Preferences → Resources → Increase CPU/Memory
    • Linux: Check available system resources with htop or free -h

Poor Calibration Results

Symptoms: Objective function not improving, parameters not converging

Solutions:

  1. Increase iterations: Try -i 200 or more
  2. Check parameter bounds: Edit calibration/ngen_cal_conf.yaml
  3. Verify observation data quality:
    head -20 calibration/obs_hourly_discharge.csv
    
  4. Adjust calibration period: Use different start/end dates

File and Permission Issues

File Permission Errors

NGIAB runs as root and ngiab-cal runs as user 1000:1000 by default, which can cause permission issues.

Fix existing permission issues:

# Fix ownership of data directory
sudo chown -R $USER:$USER provo-10154200/

# Or more specifically for calibration outputs
sudo chown -R $USER:$USER provo-10154200/calibration/Output/

Prevent permission issues:

# When running Docker manually, add user flag
docker run --user $(id -u):$(id -g) ...

Cannot Access Files

If you can't read/write files in the mounted directories:

  • Ensure the path is absolute when mounting volumes in Docker
  • Check that the directory exists before running

Docker Issues

Docker Memory Issues

Symptoms: Container killed, out of memory errors

Solutions:

  • Docker Desktop: Increase memory allocation in Preferences → Resources
  • Linux: Check available memory with free -h
  • Reduce the simulation period or catchment size

Docker Image Download Fails

Symptoms: Timeout or connection errors when pulling images

Solutions:

  1. Check internet connection
  2. Try pulling manually:
    docker pull awiciroh/ngiab-cal
    docker pull awiciroh/ciroh-ngen-image
    
  3. Use a different network or VPN if behind a firewall

Container Exits Immediately

Check the container logs:

docker logs <container_id>

Common causes:

  • Missing required files
  • Incorrect mount paths
  • Configuration syntax errors

Data Issues

Missing or Corrupt Data Files

Symptoms: Errors about missing files, unexpected EOF

Solutions:

  1. Re-download the data:

    wget https://communityhydrofabric.s3.us-east-1.amazonaws.com/example_data/provo-10154200.tar.gz
    tar -xzf provo-10154200.tar.gz
    
  2. Verify file integrity:

    # Check if files exist and have content
    ls -lah provo-10154200/forcings/
    ls -lah provo-10154200/config/
    

Jetstream VM Issues

Cannot Connect to VM

  1. Verify IP address is correct
  2. Try verbose SSH:
    ssh -v exouser@IP-ADDRESS
    

Common Error Messages

"No such file or directory"

  • Check you're in the correct directory
  • Verify paths in commands are correct
  • Ensure data has been downloaded/extracted

"Permission denied"

"Command not found"

Getting Additional Help

If you're still experiencing issues:

  1. Check existing solutions: Search GitHub Issues
  2. Create a new issue: Include:
    • Error message (full text)
    • Command you ran
    • System information (OS, Docker version)
    • Steps to reproduce
  3. Contact instructors:

Back to Home