Troubleshooting - skoriche/NGIAB-Calibration-DevCon25 GitHub Wiki
This page contains solutions to common issues you might encounter during the workshop.
If you get permission denied errors when running Docker:
sudo usermod -aG docker $USER
# Then log out and back in, or run:
newgrp docker
If uvx
command is not recognized:
#To add $HOME/.local/bin to your PATH, either restart your shell or run:
source $HOME/.local/bin/env (sh, bash, zsh)
source $HOME/.local/bin/env.fish (fish)
Windows/Mac:
- Ensure Docker Desktop is running (check system tray)
- Restart Docker Desktop from the menu
Linux:
sudo systemctl start docker
sudo systemctl enable docker
-
Check Docker is running:
docker ps
If this fails, Docker isn't running properly.
-
Verify data structure:
ls -la provo-10154200/ # Should show: calibration/ config/ forcings/ outputs/
-
Check error logs:
tail -n 50 provo-10154200/calibration/Output/Calibration_Run/ngen_*/ngen.log
- Start with fewer iterations for testing (
-i 4
or-i 10
) - Check Docker resource allocation:
- Docker Desktop: Preferences → Resources → Increase CPU/Memory
-
Linux: Check available system resources with
htop
orfree -h
Symptoms: Objective function not improving, parameters not converging
Solutions:
-
Increase iterations: Try
-i 200
or more -
Check parameter bounds: Edit
calibration/ngen_cal_conf.yaml
-
Verify observation data quality:
head -20 calibration/obs_hourly_discharge.csv
- Adjust calibration period: Use different start/end dates
NGIAB runs as root and ngiab-cal runs as user 1000:1000 by default, which can cause permission issues.
Fix existing permission issues:
# Fix ownership of data directory
sudo chown -R $USER:$USER provo-10154200/
# Or more specifically for calibration outputs
sudo chown -R $USER:$USER provo-10154200/calibration/Output/
Prevent permission issues:
# When running Docker manually, add user flag
docker run --user $(id -u):$(id -g) ...
If you can't read/write files in the mounted directories:
- Ensure the path is absolute when mounting volumes in Docker
- Check that the directory exists before running
Symptoms: Container killed, out of memory errors
Solutions:
- Docker Desktop: Increase memory allocation in Preferences → Resources
-
Linux: Check available memory with
free -h
- Reduce the simulation period or catchment size
Symptoms: Timeout or connection errors when pulling images
Solutions:
- Check internet connection
- Try pulling manually:
docker pull awiciroh/ngiab-cal docker pull awiciroh/ciroh-ngen-image
- Use a different network or VPN if behind a firewall
Check the container logs:
docker logs <container_id>
Common causes:
- Missing required files
- Incorrect mount paths
- Configuration syntax errors
Symptoms: Errors about missing files, unexpected EOF
Solutions:
-
Re-download the data:
wget https://communityhydrofabric.s3.us-east-1.amazonaws.com/example_data/provo-10154200.tar.gz tar -xzf provo-10154200.tar.gz
-
Verify file integrity:
# Check if files exist and have content ls -lah provo-10154200/forcings/ ls -lah provo-10154200/config/
- Verify IP address is correct
- Try verbose SSH:
ssh -v exouser@IP-ADDRESS
- Check you're in the correct directory
- Verify paths in commands are correct
- Ensure data has been downloaded/extracted
- See File Permission Errors above
- Check Docker permissions
- For
uvx
: See UV Command Not Found - For
docker
: Ensure Docker is installed and in PATH
If you're still experiencing issues:
- Check existing solutions: Search GitHub Issues
-
Create a new issue: Include:
- Error message (full text)
- Command you ran
- System information (OS, Docker version)
- Steps to reproduce
-
Contact instructors:
- During workshop: Ask in person or via chat
- Email: [email protected], [email protected]