Post Deployment - ericfitz/tmi GitHub Wiki

Post-Deployment

This guide covers verification, testing, and next steps after deploying TMI.

Overview

After deploying all TMI components, complete the following steps:

Verify that all services are running
Perform smoke tests
Test critical user flows
Set up monitoring and alerts
Document your deployment
Train users (optional)

Deployment Verification

Check All Services

Verify each component is running and accessible.

Web Application

# Check if web app is accessible
curl -I https://tmi.example.com

# Expected: HTTP/2 200 OK

# Verify static assets load
curl -I https://tmi.example.com/main.js

# Expected: HTTP/2 200 OK with Cache-Control headers

In browser:

Navigate to https://tmi.example.com
Verify page loads without errors
Open DevTools Console - should have no errors
Check Network tab - all resources should load (200 OK)

API Server

# Health check (root endpoint serves as health/info endpoint)
curl https://api.tmi.example.com/

# Expected response (JSON when called without Accept: text/html):
{
  "status": {
    "code": "ok",
    "time": "2025-11-12T00:00:00Z"
  },
  "service": {
    "name": "TMI",
    "build": "1.3.2-abc1234 (production)"
  },
  "api": {
    "specification": "https://github.com/ericfitz/tmi/blob/main/api-schema/tmi-openapi.json",
    "version": "1.3.2"
  }
}
# Note: status.code is "ok", "degraded", or "error".
# When "degraded", a "health" object with database/redis details is included.
# An optional "operator" object (name, contact) may also appear if configured.

# Check OAuth providers
curl https://api.tmi.example.com/oauth2/providers

# Expected response with configured providers

PostgreSQL

# Connection test
psql -h postgres-host -U tmi_user -d tmi -c "SELECT version();"

# Check tables exist
psql -h postgres-host -U tmi_user -d tmi -c "\dt"

# Expected: List of TMI tables (users, threat_models, diagrams, etc.)

# Check for data
psql -h postgres-host -U tmi_user -d tmi -c "SELECT count(*) FROM users;"

Redis

# Connection test
redis-cli -h redis-host -p 6379 -a password ping

# Expected: PONG

# Check memory usage
redis-cli -h redis-host -a password info memory | grep used_memory_human

# Check if keys exist
redis-cli -h redis-host -a password DBSIZE

Smoke Tests

Perform basic functionality tests to verify the system works correctly.

Test 1: User Registration via OAuth

Navigate to app: https://tmi.example.com
Click "Login" or appropriate login button
Select OAuth provider (Google, GitHub, or Microsoft)
Authenticate with provider
Verify redirect back to application
Check authentication state:
- User should be logged in
- User name/email should display
- No console errors

Expected result: User successfully logs in and sees the authenticated view.

Test 2: Create Threat Model

After logging in:

Click "New Threat Model" or similar button
Fill in details:
- Name: "Test Threat Model"
- Description: "Smoke test"
- Framework: STRIDE (or other)
Submit
Verify creation:
- Threat model appears in list
- Can navigate to threat model detail page
- No errors in console

Expected result: Threat model is created and accessible.

Test 3: Create Data Flow Diagram

From threat model detail page:

Click "New Diagram" or similar
Name: "Test Diagram"
Open diagram editor
Add a few shapes:
- External entity
- Process
- Data store
Save
Verify:
- Diagram appears in list
- Can reopen diagram
- Shapes are persisted

Expected result: Diagram is created and editable.

Test 4: Real-Time Collaboration

Test WebSocket functionality (requires two browsers/users):

User 1:

Open threat model
Open diagram in edit mode
Add a shape

User 2:

Open same diagram
Verify shape appears in real-time

Expected result: Changes appear instantly for both users.

Test 5: Create Threat

From threat model page:

Click "Add Threat"
Fill in details:
- Title: "Test Threat"
- Description: "Testing threat creation"
- Category: Spoofing (or other)
- Severity: High
Save
Verify:
- Threat appears in list
- Can view threat details
- Can edit threat

Expected result: Threat is created and manageable.

Integration Tests

Test integration between components.

OAuth Token Exchange

# This test simulates the OAuth flow programmatically

# Step 1: Get OAuth authorization URL
curl https://api.tmi.example.com/oauth2/providers

# Step 2: Manually authenticate and get authorization code
# (Done in browser, copy code from callback URL)

# Step 3: Exchange code for tokens
curl -X POST https://api.tmi.example.com/oauth2/token?idp=google \
  -H "Content-Type: application/json" \
  -d '{
    "code": "AUTHORIZATION_CODE",
    "state": "RANDOM_STATE",
    "redirect_uri": "https://tmi.example.com/oauth2/callback"
  }'

# Expected: JWT access and refresh tokens

API Authentication

# Test authenticated API call
ACCESS_TOKEN="your-access-token-from-above"

curl https://api.tmi.example.com/threat_models \
  -H "Authorization: Bearer $ACCESS_TOKEN"

# Expected: List of threat models (or empty array)

Database Connectivity

# Verify server can connect to database
# Check server logs for successful connection

# Docker deployment:
docker logs tmi-api-prod 2>&1 | grep -i database

# Or systemd deployment:
journalctl -u tmi -n 100 | grep -i database

# Should show successful connection and migration status

Redis Caching

# Check if cache is working
# After using application, check Redis for cached data
redis-cli -h redis-host -a password --scan --pattern "cache:*"

# Should show various cache keys

Performance Testing

Load Test API Endpoints

Use ab (Apache Bench) or similar:

# Test root/health endpoint
ab -n 1000 -c 10 https://api.tmi.example.com/

# Expected:
# - All requests succeed (200 OK)
# - Average response time < 100ms
# - No errors

Test Database Query Performance

-- Check slow queries
SELECT
    query,
    calls,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 10
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Expected: Few or no slow queries

Test Redis Performance

# Benchmark Redis
redis-cli -h redis-host -a password --latency-history

# Expected:
# - Average latency < 1ms
# - No significant spikes

Security Verification

SSL/TLS Configuration

# Check SSL certificate
echo | openssl s_client -servername tmi.example.com \
  -connect tmi.example.com:443 2>/dev/null | \
  openssl x509 -noout -dates

# Verify:
# - Certificate is valid (not expired)
# - Valid for your domain

# Check SSL configuration
curl -I https://tmi.example.com | grep -i "strict-transport-security"

# Expected: HSTS header present

Security Headers

# Check security headers
curl -I https://tmi.example.com

# Should include:
# - Strict-Transport-Security
# - X-Frame-Options
# - X-Content-Type-Options
# - X-XSS-Protection

OAuth Security

Verify OAuth configuration:

Redirect URI validation: Try using wrong redirect URI - should fail
State parameter validation: Try reusing state - should fail
Token expiration: Wait for token to expire, verify refresh works
Logout: Verify tokens are invalidated after logout

Database Security

# Verify database is not publicly accessible
nmap -p 5432 your-public-ip

# Expected: Port 5432 should be filtered or closed

# Try connection from unauthorized host
psql -h postgres-host -U tmi_user -d tmi

# Expected: Connection should be rejected if not from allowed IP

Monitoring Setup

Application Monitoring

Set up health check monitoring:

# Create health check script
cat > /usr/local/bin/tmi-health-check.sh << 'EOF'
#!/bin/bash
ERRORS=0

# Check web app
if ! curl -f -s -o /dev/null https://tmi.example.com; then
    echo "Web app failed"
    ERRORS=$((ERRORS + 1))
fi

# Check API (root endpoint is the health/info endpoint)
if ! curl -f -s -o /dev/null https://api.tmi.example.com/; then
    echo "API failed"
    ERRORS=$((ERRORS + 1))
fi

# Check database
if ! psql -h postgres-host -U tmi_user -d tmi -c "SELECT 1;" > /dev/null 2>&1; then
    echo "Database failed"
    ERRORS=$((ERRORS + 1))
fi

# Check Redis
if ! redis-cli -h redis-host -a password ping > /dev/null 2>&1; then
    echo "Redis failed"
    ERRORS=$((ERRORS + 1))
fi

if [ $ERRORS -gt 0 ]; then
    exit 1
else
    echo "All services healthy"
    exit 0
fi
EOF

chmod +x /usr/local/bin/tmi-health-check.sh

Schedule with cron:

# Add to crontab - run every 5 minutes
*/5 * * * * /usr/local/bin/tmi-health-check.sh || mail -s "TMI Health Check Failed" [email protected]

Log Aggregation

Set up log collection:

# Check logs are being generated

# Docker deployment:
docker logs -f tmi-api-prod

# Or systemd deployment:
sudo journalctl -u tmi -f

# Web server:
sudo tail -f /var/log/nginx/access.log

# Consider setting up centralized logging:
# - ELK Stack (Elasticsearch, Logstash, Kibana)
# - Loki + Grafana
# - Splunk
# - Datadog

Metrics Collection

Consider deploying metrics collection:

Prometheus: Collect metrics from TMI server
Grafana: Visualize metrics with dashboards
Node Exporter: System metrics
PostgreSQL Exporter: Database metrics
Redis Exporter: Cache metrics

Alerting

Set up alerts for critical events:

API server down
Database connection failed
Redis unavailable
High error rate
Slow response times
Certificate expiring soon
Disk space low

Backup Verification

Test Database Backups

# Perform manual backup
pg_dump -h postgres-host -U tmi_user -d tmi -Fc \
  -f /backup/tmi_test_$(date +%Y%m%d).dump

# Verify backup file exists and has content
ls -lh /backup/tmi_test_*.dump

# Test restore to temporary database
createdb -h postgres-host -U postgres tmi_test_restore
pg_restore -h postgres-host -U tmi_user \
  -d tmi_test_restore /backup/tmi_test_*.dump

# Verify data
psql -h postgres-host -U tmi_user -d tmi_test_restore \
  -c "SELECT count(*) FROM threat_models;"

# Clean up
dropdb -h postgres-host -U postgres tmi_test_restore

Test Redis Backups

# Trigger Redis save
redis-cli -h redis-host -a password BGSAVE

# Check RDB file exists
ls -lh /var/lib/redis/dump.rdb

# Verify AOF if enabled
ls -lh /var/lib/redis/appendonly.aof

Documentation

Document your deployment for future reference and team members.

Deployment Documentation

Create /opt/tmi/DEPLOYMENT_INFO.md:

# TMI Deployment Information

**Deployment Date**: 2025-11-12
**Deployed By**: Your Name
**Environment**: Production

## URLs
- Web Application: https://tmi.example.com
- API Server: https://api.tmi.example.com
- Admin Contact: [email protected]

## Infrastructure
- Web Server: nginx on server1.example.com (203.0.113.10)
- API Server: TMI v1.0.0 on server2.example.com (203.0.113.11)
- Database: PostgreSQL 15 on db.example.com (internal)
- Cache: Redis 7 on cache.example.com (internal)

## Configuration
- Config File: config-production.yml (mounted at /etc/tmi/ in Docker)
- Environment Variables: TMI_ prefix (e.g., TMI_DATABASE_URL, TMI_REDIS_URL, TMI_JWT_SECRET)
- Environment: Production
- OAuth Providers: Google, GitHub
- JWT Expiration: Configurable via TMI_JWT_EXPIRATION_SECONDS

## Backups
- Database: Daily at 2 AM to /backup/postgresql/
- Retention: 7 days
- Redis: RDB + AOF persistence enabled

## Monitoring
- Health Checks: Every 5 minutes via cron
- Logs: docker logs tmi-api-prod, /var/log/nginx/
- Alerts: Email to [email protected]

## Maintenance
- SSL Certificates: Auto-renewal via Let's Encrypt
- Updates: Manual, test in staging first
- Contact: [email protected] for issues

Runbook

Create operational runbook for common tasks:

# TMI Operations Runbook

## Starting Services (Docker Compose)

docker compose -f docker-compose.prod.yml up -d

### Or individually:
docker start tmi-api-prod
docker start tmi-postgres-prod
docker start tmi-redis-prod

## Stopping Services
docker compose -f docker-compose.prod.yml down
# Or individually (reverse order of starting):
docker stop tmi-api-prod
docker stop tmi-redis-prod
docker stop tmi-postgres-prod

## Starting Services (systemd, if applicable)

### Web Application (nginx)
sudo systemctl start nginx

### API Server
sudo systemctl start tmi

### Database
sudo systemctl start postgresql

### Redis
sudo systemctl start redis

## Restarting After Configuration Change
1. Edit config: config-production.yml (or update TMI_ environment variables)
2. Restart: docker restart tmi-api-prod (or sudo systemctl restart tmi)
3. Verify: curl https://api.tmi.example.com/

## Viewing Logs
- API (Docker): docker logs -f tmi-api-prod
- API (systemd): sudo journalctl -u tmi -f
- Web: sudo tail -f /var/log/nginx/access.log
- Database: docker logs -f tmi-postgres-prod
- Redis: docker logs -f tmi-redis-prod

## Common Issues
See Troubleshooting section below.

User Training

If deploying for a team, consider:

User Documentation: Link to the Home
Training Session: Walk through creating first threat model
Demo Video: Record tutorial for async learning
Support Channel: Set up Slack/Teams channel for questions
Admin Documentation: Document admin tasks

Next Steps

After successful deployment:

Immediate (Day 1)

Monitor logs for errors
Watch for failed authentication attempts
Verify backups are running
Check health check alerts working

Short Term (Week 1)

Review security logs
Monitor performance metrics
Gather user feedback
Document any issues encountered
Fine-tune performance if needed

Ongoing

Regular security updates
Monitor resource usage
Review and rotate logs
Test backup restores monthly
Update documentation as needed
Plan for scaling if usage grows

Common Post-Deployment Issues

High Memory Usage

Symptom: Server running out of memory.

Solutions:

Check Redis memory usage: redis-cli info memory
Review PostgreSQL connection count
Check for memory leaks in TMI server logs
Adjust cache TTL settings
Scale up server resources

Slow Response Times

Symptom: Application feels sluggish.

Solutions:

Check database query performance
Review Redis cache hit rate
Enable query logging to find slow queries
Optimize database indexes
Consider connection pooling
Add more server instances

Authentication Failures

Symptom: Users can't log in.

Solutions:

Verify OAuth provider credentials
Check OAuth callback URLs match
Review TMI server logs for token exchange errors
Verify TMI_JWT_SECRET is set correctly
Check OAuth provider status pages

WebSocket Disconnections

Symptom: Real-time collaboration doesn't work.

Solutions:

Check reverse proxy WebSocket configuration
Verify WebSocket timeout settings
Review CORS configuration
Check for network issues
Verify allowed origins configuration

Rollback Plan

If deployment has critical issues:

Rollback Steps

Stop the current TMI server:

# Docker deployment:
docker stop tmi-api-prod

# Or systemd deployment:
sudo systemctl stop tmi

Restore previous version:

# Docker deployment - roll back to previous image tag:
docker compose -f docker-compose.prod.yml down
# Edit docker-compose.prod.yml to use previous VERSION, then:
docker compose -f docker-compose.prod.yml up -d

# Or systemd deployment:
sudo cp /opt/tmi/tmiserver.backup /opt/tmi/tmiserver
sudo cp /etc/tmi/config-production.yml.backup /etc/tmi/config-production.yml

Restore database (if schema changed):

psql -h postgres-host -U tmi_user -d tmi < /backup/pre_deployment_backup.sql

Start previous version (systemd only; Docker restarts via compose above):
```
sudo systemctl start tmi
```
Verify rollback:
```
curl https://api.tmi.example.com/
```
Notify users of temporary issues and rollback

Success Criteria

Your deployment is successful when:

All services are running and accessible
Users can log in via OAuth
Users can create threat models
Users can create and edit diagrams
Real-time collaboration works
No errors in logs (or only expected warnings)
Health checks pass consistently
Backups are running automatically
Monitoring and alerts are active
SSL certificates are valid
Performance is acceptable
Security scan shows no critical issues

Getting Help

If you encounter issues:

Check documentation:
Review Logs:
- Application logs
- Web server logs
- Database logs
- System logs
Community Support:
- GitHub Issues: tmi
- GitHub Discussions: Share deployment experiences
Professional Support:
- Contact maintainers for enterprise support
- Consider hiring DevOps consultant for complex deployments

Congratulations

You have successfully deployed TMI. Your team can now:

Create threat models collaboratively
Build data flow diagrams in real-time
Document threats systematically
Integrate with issue tracking systems
Export and share threat models

Planning-Your-Deployment -- Pre-deployment planning and decision matrix
Component-Integration -- Connecting TMI components
Security-Best-Practices -- Security guidelines
Monitoring-and-Health -- Ongoing monitoring procedures
Common-Issues -- Frequently encountered problems
Performance-and-Scaling -- Performance tuning and scaling strategies

Post Deployment - ericfitz/tmi GitHub Wiki

Post-Deployment

Overview

Deployment Verification

Check All Services

Web Application

API Server

PostgreSQL

Redis

Smoke Tests

Test 1: User Registration via OAuth

Test 2: Create Threat Model

Test 3: Create Data Flow Diagram

Test 4: Real-Time Collaboration

Test 5: Create Threat

Integration Tests

OAuth Token Exchange

API Authentication

Database Connectivity

Redis Caching

Performance Testing

Load Test API Endpoints

Test Database Query Performance

Test Redis Performance

Security Verification

SSL/TLS Configuration

Security Headers

OAuth Security

Database Security

Monitoring Setup

Application Monitoring

Log Aggregation

Metrics Collection

Alerting

Backup Verification

Test Database Backups

Test Redis Backups

Documentation

Deployment Documentation

Runbook

User Training

Next Steps

Immediate (Day 1)

Short Term (Week 1)

Ongoing

Common Post-Deployment Issues

High Memory Usage

Slow Response Times

Authentication Failures

WebSocket Disconnections

Rollback Plan

Rollback Steps

Success Criteria

Getting Help

Congratulations

Related Pages