Common Issues - ericfitz/tmi GitHub Wiki

Common Issues

This page provides solutions to frequently encountered problems with TMI deployment and operation.

Authentication Problems

Owner Shown as Read-Only When Creating Threat Models

Symptoms:

  • User creates a new threat model but sees read-only access
  • Permission shown as null or reader instead of owner
  • Cannot edit threat model immediately after creation

Root Cause: Backend bug where owner.provider_id contains the user's email address instead of the OAuth provider ID.

The authorization service compares the threat model owner against the current user using provider and provider_id. When the backend stores email in provider_id:

// What the backend returns:
owner: {
  provider: "google",
  provider_id: "[email protected]"  // EMAIL (incorrect)
}

// What the frontend expects:
currentUser: {
  provider: "google",
  provider_id: "101155414856250184779"  // OAUTH PROVIDER ID
}

This mismatch causes permission calculation to fail.

Frontend Workaround: The ThreatModelAuthorizationService includes an email fallback:

  1. Primary match: Compares provider and provider_id (OAuth ID)
  2. Fallback match: If provider_id doesn't match, compares the owner's provider_id against the user's email
  3. Warning log: When fallback is used, logs a warning about the backend bug

The relevant code is in src/app/pages/tm/services/threat-model-authorization.service.ts in the calculateUserPermission method.

Identifying This Issue:

  • Check browser console for the warning: "Owner matched via email fallback - backend is storing email in provider_id field"
  • Enable debug logging to see detailed permission calculation

Backend Fix Required: The backend should store the actual OAuth provider ID in the provider_id field, not the email address. The email should only be in the email field.

Related: See Principal-Based-Identity-Migration for the Principal-based identity architecture.

OAuth Login Fails

Symptoms:

  • Redirect to OAuth provider fails
  • "Invalid client" error from provider
  • Callback fails with 400/500 error

Common Causes:

  1. Incorrect Client ID/Secret: Credentials don't match provider configuration
  2. Wrong Callback URL: Callback URL mismatch between TMI config and provider
  3. Provider Not Enabled: OAuth provider not configured in TMI
  4. Network Issues: Cannot reach OAuth provider

Solutions:

# Verify OAuth configuration
grep -i oauth config-*.yml

# Check environment variables (provider-specific, e.g., for GitHub)
env | grep OAUTH_PROVIDERS_GITHUB

# Test provider reachability
curl -I https://github.com/login/oauth/authorize

# Check TMI OAuth endpoints
curl http://localhost:8080/oauth2/providers

Configuration Fix:

# Set OAuth provider via environment variables (TMI_ prefix)
export OAUTH_PROVIDERS_GITHUB_ENABLED=true
export OAUTH_PROVIDERS_GITHUB_CLIENT_ID="your_client_id_here"
export OAUTH_PROVIDERS_GITHUB_CLIENT_SECRET="your_client_secret_here"
# Callback URL defaults to <base_url>/oauth2/callback

See Setting-Up-Authentication for complete OAuth setup.

OAuth Token Exchange Succeeds but New User Creation Returns 502

As of TMI v1.4.0, this failure mode is caught at startup before it can affect any user: the server refuses to start if a provider lacks the required subject_claim mapping. If you upgraded from an earlier version and your prod env vars are missing claim mappings, your server will fail to start with a clear error (see Setting-Up-Authentication#startup-validation).

The runtime symptom below now returns HTTP 502 (not 500) with a non-leaky body; operators get the full diagnostic in server logs. This still happens if config is hot-deployed without a server restart, or if a provider's userinfo response is unexpectedly incomplete for a single user.

Symptoms:

  • User completes the OAuth flow at the provider (Google/GitHub/Microsoft consent screen accepted)
  • POST /oauth2/token returns HTTP 502 (or HTTP 500 on TMI < 1.4.0)
  • Response body: {"error": "provider_response_invalid", "error_description": "Authentication provider returned incomplete profile data..."}
  • Affects only first-time logins for a given provider; existing users are unaffected

Diagnostic Log Pattern:

In the server logs (or heroku logs --app tmi-server), the failing request shows token exchange succeeded but claims extraction returned an empty user_id:

INFO Token exchange successful provider_id=github ...
INFO Default claim mappings applied applied_count=2 total_mappings=2
INFO Claims extraction completed user_id= [email protected] ...
ERROR Failed to create new user: ... error=provider and provider_user_id are required

The smoking gun is user_id= (empty) on the "Claims extraction completed" line.

Root Cause: The provider is non-OIDC (e.g., GitHub returns id, Microsoft Graph returns id), but no explicit SUBJECT_CLAIM mapping is configured. The system falls back to the OIDC default sub, which the provider does not return.

Fix: Set the missing claim mapping for that provider, e.g.:

heroku config:set --app=tmi-server \
  OAUTH_PROVIDERS_GITHUB_USERINFO_CLAIMS_SUBJECT_CLAIM=id \
  OAUTH_PROVIDERS_GITHUB_USERINFO_CLAIMS_NAME_CLAIM=name \
  OAUTH_PROVIDERS_GITHUB_USERINFO_SECONDARY_URL=https://api.github.com/user/emails \
  OAUTH_PROVIDERS_GITHUB_USERINFO_SECONDARY_CLAIMS_EMAIL_CLAIM='[0].email' \
  OAUTH_PROVIDERS_GITHUB_USERINFO_SECONDARY_CLAIMS_EMAIL_VERIFIED_CLAIM='[0].verified'

See Setting-Up-Authentication#claim-mappings for the complete env-var reference and per-provider examples (GitHub, Microsoft).

JWT Token Invalid or Expired

Symptoms:

  • API returns 401 Unauthorized
  • "Token expired" error message
  • Need to re-authenticate frequently

Common Causes:

  1. Token has expired (default: 24 hours)
  2. JWT secret changed on server
  3. Token not included in request
  4. Malformed Authorization header

Solutions:

# Check token expiration
# Decode JWT at https://jwt.io or:
echo "YOUR_TOKEN" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq .exp

# Verify Authorization header format
curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:8080/api/v1/threat-models

# Re-authenticate to get new token
curl http://localhost:8080/oauth2/authorize?idp=github

Prevention:

  • Implement token refresh logic in clients
  • Increase token lifetime if appropriate (trade-off with security)
  • Store tokens securely and check expiration before use

JWT Secret Mismatch

Symptoms:

  • All authentication fails after server restart
  • "Invalid signature" errors
  • Previously valid tokens rejected

Cause: TMI_JWT_SECRET changed or not set consistently

Solution:

# Ensure TMI_JWT_SECRET is set consistently
export TMI_JWT_SECRET="your-long-random-secret-256-bits"

# Or set in config YAML under auth.jwt.secret
# Restart server
./bin/tmiserver

Important: Changing TMI_JWT_SECRET invalidates all existing tokens. Users must re-authenticate.

Connection Issues

Cannot Connect to Database

Symptoms:

  • "connection refused" errors
  • "password authentication failed"
  • Server fails to start with database errors

Diagnosis:

# Check if PostgreSQL is running
pg_isready -h localhost -p 5432

# Test connection with psql
psql -h localhost -U postgres -d tmi

# Check TMI database configuration
echo $TMI_DATABASE_URL

# View database logs
docker logs tmi-postgresql  # if using Docker
tail -f /var/log/postgresql/postgresql-*.log

Common Causes & Solutions:

  1. PostgreSQL not running:

    # Start PostgreSQL
    make start-database
    # Or with Docker
    docker start tmi-postgresql
  2. Wrong credentials or connection string:

    # Verify database URL environment variable
    echo $TMI_DATABASE_URL
    
    # Update configuration (12-factor app pattern)
    export TMI_DATABASE_URL="postgres://user:pass@localhost:5432/tmi?sslmode=disable"
  3. Wrong host/port:

    # Check database is listening
    netstat -an | grep 5432
    
    # Update the host/port in TMI_DATABASE_URL
    export TMI_DATABASE_URL="postgres://user:pass@localhost:5432/tmi?sslmode=disable"
  4. Database doesn't exist:

    # Create database
    createdb -U postgres tmi
    
    # Run migrations
    ./bin/migrate up

See Database-Setup for database configuration details.

Cannot Connect to Redis

Symptoms:

  • "connection refused" to Redis
  • WebSocket features not working
  • Real-time collaboration fails

Diagnosis:

# Check if Redis is running
redis-cli ping

# Test connection
redis-cli -h localhost -p 6379 INFO

# Check TMI Redis configuration
grep -i redis config-*.yml

Solutions:

  1. Redis not running:

    # Start Redis
    make start-redis
    # Or with Docker
    docker start tmi-redis
  2. Wrong host/port:

    # Update configuration via URL (preferred)
    export TMI_REDIS_URL="redis://localhost:6379"
    
    # Or via individual environment variables
    export TMI_REDIS_HOST=localhost
    export TMI_REDIS_PORT=6379
  3. Redis requires password:

    # Test with password
    redis-cli -a yourpassword ping
    
    # Configure in TMI via URL (preferred)
    export TMI_REDIS_URL="redis://:yourpassword@localhost:6379"
    
    # Or via individual environment variable
    export TMI_REDIS_PASSWORD=yourpassword

See Configuration-Reference for Redis configuration options.

Port Already in Use

Symptoms:

  • "address already in use" error
  • Server fails to start
  • Cannot bind to port 8080

Diagnosis:

# Check what's using the port
lsof -i :8080
netstat -an | grep 8080

# Find process ID
ps aux | grep tmiserver

Solutions:

  1. Kill conflicting process:

    # Find and kill process
    lsof -ti :8080 | xargs kill -9
  2. Change TMI port:

    # Use different port
    export TMI_SERVER_PORT=9090
    ./bin/tmiserver
  3. Check for duplicate instances:

    # List all tmiserver processes
    ps aux | grep tmiserver
    
    # Kill all instances
    pkill tmiserver

WebSocket Problems

WebSocket Connection Fails

Symptoms:

  • Real-time updates not working
  • "WebSocket connection failed" in browser console
  • Diagram edits not synchronizing

Common Causes:

  1. TLS Mismatch:

    • Using ws:// with HTTPS site (should be wss://)
    • Using wss:// with HTTP site (should be ws://)
  2. Firewall/Proxy Blocking:

    • Corporate firewall blocks WebSocket
    • Proxy doesn't support WebSocket upgrade
  3. Redis Connection Issues:

    • Redis not running
    • Redis connection dropped

Solutions:

  1. Fix TLS mismatch:

    // In web client configuration
    const wsUrl = window.location.protocol === 'https:'
      ? 'wss://api.example.com/ws'
      : 'ws://api.example.com/ws';
  2. Check browser console:

    • Open Developer Tools (F12)
    • Check Console tab for WebSocket errors
    • Check Network tab for WebSocket handshake
  3. Test WebSocket directly:

    # Use websocat or wscat
    websocat wss://api.example.com/ws
    
    # Check server logs
    grep -i websocket logs/server.log
  4. Verify Redis:

    # Check Redis is running
    redis-cli ping
    
    # Check Redis connections
    redis-cli CLIENT LIST | grep websocket

See Debugging-Guide#debugging-websocket-issues for detailed WebSocket troubleshooting.

WebSocket Disconnects Frequently

Symptoms:

  • Connection drops every few minutes
  • "Connection lost" messages
  • Need to refresh frequently

Common Causes:

  1. Proxy timeout
  2. Load balancer timeout
  3. Redis connection issues
  4. Network instability

Solutions:

  1. Increase proxy timeout:

    # Nginx example
    location /ws {
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
    }
  2. Implement reconnection logic:

    // Client-side reconnection
    socket.onclose = function() {
        setTimeout(connectWebSocket, 1000);
    };
  3. Check Redis stability:

    # Monitor Redis
    redis-cli MONITOR
    
    # Check Redis memory
    redis-cli INFO memory

Database Issues

Migration Fails

Symptoms:

  • "migration failed" error
  • Database schema out of sync
  • Missing tables or columns

Diagnosis:

# Check migration status
./bin/migrate version

# List available migrations
ls -la auth/migrations/

# Check database schema
psql -d tmi -c "\dt"

Solutions:

  1. Run migrations:

    # Apply all pending migrations
    ./bin/migrate up
    
    # Apply specific migration
    ./bin/migrate up 005
  2. Migration fails midway:

    # Check migration state
    psql -d tmi -c "SELECT * FROM schema_migrations"
    
    # Force migration version (use carefully!)
    ./bin/migrate force <version>
    
    # Try again
    ./bin/migrate up
  3. Start fresh (development only):

    # WARNING: Destroys all data!
    ./bin/migrate drop
    ./bin/migrate up

See Database-Operations#migrations for migration procedures.

Database Performance Issues

Symptoms:

  • Slow query responses
  • API timeouts
  • High database CPU/memory usage

Diagnosis:

-- Find slow queries
SELECT pid, query, query_start, state_change - query_start AS duration
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- Check table sizes
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public' AND n_distinct > 100
ORDER BY n_distinct DESC;

Solutions:

  • Add indexes on frequently queried columns
  • Optimize queries with EXPLAIN ANALYZE
  • Increase database connection pool size
  • Scale database resources

See Performance-Troubleshooting#database-performance for detailed optimization.

Performance Problems

High Memory Usage

Symptoms:

  • Server using excessive RAM
  • Out of memory errors
  • Slow performance

Diagnosis:

# Check memory usage
free -h
top -o MEM

# Check process memory
ps aux --sort=-%mem | head -10

# Check Redis memory
redis-cli INFO memory

Solutions:

  1. Redis memory issues:

    # Check Redis keys
    redis-cli DBSIZE
    
    # Set maxmemory limit
    redis-cli CONFIG SET maxmemory 2gb
    redis-cli CONFIG SET maxmemory-policy allkeys-lru
    
    # Clean up expired keys
    redis-cli --scan --pattern "*" | xargs redis-cli DEL
  2. Application memory leaks:

    • Check for goroutine leaks: /debug/pprof/goroutine
    • Profile memory: /debug/pprof/heap
    • Restart server to clear
  3. Database connection pool:

    # Reduce connection pool size
    export TMI_DB_MAX_OPEN_CONNS=10
    export TMI_DB_MAX_IDLE_CONNS=5

See Performance-Troubleshooting for comprehensive performance tuning.

Slow API Responses

Symptoms:

  • API requests take multiple seconds
  • Timeouts
  • Poor user experience

Diagnosis:

# Test API response time
time curl http://localhost:8080/api/v1/threat-models

# Check server logs for slow queries
grep "duration=[5-9][0-9][0-9]" logs/server.log

# Profile API endpoints
curl http://localhost:8080/debug/pprof/profile?seconds=30 > profile.out
go tool pprof profile.out

Solutions:

  • Optimize database queries
  • Add caching with Redis
  • Increase server resources
  • Scale horizontally with load balancer

See Performance-and-Scaling for scaling strategies.

Configuration Issues

Configuration Not Loading

Symptoms:

  • Server uses default values
  • Environment variables ignored
  • YAML config not applied

Diagnosis:

# Check config file location
ls -la config-*.yml

# Verify environment variables (all use TMI_ prefix)
env | grep TMI_

# Check server startup logs
grep -i "loading configuration\|config" logs/server.log

Solutions:

  1. Specify config file:

    # Use --config flag
    ./bin/tmiserver --config=config-production.yml
  2. Environment variable precedence:

    • Environment variables override YAML config
    • Use unset VAR to remove env var
    • Check for typos in variable names
  3. YAML syntax errors:

    # Validate YAML
    yamllint config-production.yml
    
    # Or use Python
    python -c "import yaml; yaml.safe_load(open('config-production.yml'))"

See Configuration-Reference for complete configuration guide.

TLS/HTTPS Not Working

Symptoms:

  • Browser shows "Not Secure"
  • Certificate errors
  • Unable to connect via HTTPS

Diagnosis:

# Check TLS configuration
grep -i tls config-*.yml
echo $TMI_SERVER_TLS_ENABLED
echo $TMI_SERVER_TLS_CERT_FILE

# Verify certificate files exist
ls -la /path/to/cert.pem
ls -la /path/to/key.pem

# Test certificate
openssl x509 -in /path/to/cert.pem -text -noout

# Test HTTPS connection
curl -v https://localhost:8443

Solutions:

  1. Enable TLS:

    export TMI_SERVER_TLS_ENABLED=true
    export TMI_SERVER_TLS_CERT_FILE=/etc/tmi/cert.pem
    export TMI_SERVER_TLS_KEY_FILE=/etc/tmi/key.pem
  2. Certificate issues:

    # Generate self-signed cert (development)
    openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
    
    # Use Let's Encrypt (production)
    certbot certonly --standalone -d api.example.com
  3. Certificate permissions:

    # Fix file permissions
    chmod 600 /etc/tmi/key.pem
    chmod 644 /etc/tmi/cert.pem
    chown tmi:tmi /etc/tmi/*.pem

See Configuration-Reference#tls-configuration for TLS setup.

Docker/Container Issues

Container Won't Start

Symptoms:

  • docker run fails immediately
  • Container status is "Exited"
  • No logs produced

Diagnosis:

# Check container status
docker ps -a

# View container logs
docker logs tmi-server

# Inspect container
docker inspect tmi-server

# Check for port conflicts
docker ps | grep 8080

Solutions:

  1. Port conflict:

    # Use different port
    docker run -p 9090:8080 tmi-server
  2. Environment variables:

    # Pass environment variables
    docker run -e TMI_DATABASE_URL="postgres://user:[email protected]:5432/tmi" \
               -e TMI_REDIS_URL="redis://host.docker.internal:6379" \
               tmi-server
  3. Volume mount issues:

    # Fix volume permissions
    chmod -R 755 /path/to/data
    
    # Mount volume
    docker run -v /path/to/data:/data tmi-server

Container Network Issues

Symptoms:

  • Container cannot reach database
  • Container cannot reach Redis
  • Services cannot communicate

Solutions:

  1. Use Docker network:

    # Create network
    docker network create tmi-network
    
    # Run containers on same network
    docker run --network tmi-network --name postgres postgres:15
    docker run --network tmi-network --name redis redis:7
    docker run --network tmi-network \
      -e TMI_DATABASE_URL="postgres://user:pass@postgres:5432/tmi" \
      -e TMI_REDIS_URL="redis://redis:6379" \
      tmi-server
  2. Use host networking (Linux only):

    docker run --network host tmi-server
  3. Use host.docker.internal (Mac/Windows):

    docker run -e TMI_DATABASE_URL="postgres://user:[email protected]:5432/tmi" \
               -e TMI_REDIS_URL="redis://host.docker.internal:6379" \
               tmi-server

Quick Reference

Health Check Commands

# Overall system health
make status

# Database connectivity
make db-ping

# Redis connectivity
redis-cli ping

# Server health endpoint (root path, not /health)
curl http://localhost:8080/

# Check all services
docker ps

Log Locations

# Server logs
tail -f logs/server.log

# PostgreSQL logs
tail -f /var/log/postgresql/postgresql-*.log

# Redis logs
tail -f /var/log/redis/redis-server.log

# Docker logs
docker logs tmi-server
docker logs tmi-postgresql
docker logs tmi-redis

Common Fix Commands

# Restart all services
make restart

# Reset database (development)
make db-reset

# Clear Redis cache
redis-cli FLUSHDB

# Rebuild containers (uses Chainguard base images)
make build-containers

# Or rebuild individual containers
make build-container-tmi     # TMI server only
make build-container-db      # PostgreSQL only
make build-container-redis   # Redis only

# Run migrations
./bin/migrate up

Related Pages

⚠️ **GitHub.com Fallback** ⚠️