DEPLOYMENT GUIDE - nself-org/nchat GitHub Wiki
Version: 1.0.9 Last Updated: 2026-04-18 Status: Production Ready
- Overview
- Deployment Scripts
- Local Development Deployment
- Staging Deployment
- Production Deployment
- Health Checks
- Rollback Procedures
- Troubleshooting
- Best Practices
nself-chat provides deterministic deployment scripts for three environments:
| Environment | Script | Purpose | Safety Level |
|---|---|---|---|
| Local | deploy-local.sh |
Development environment | Low (fast iteration) |
| Staging | deploy-staging.sh |
Pre-production testing | Medium (validation + rollback) |
| Production | deploy-production.sh |
Live production | High (maximum safety) |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β nself-chat Deployment β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β Backend ββββββΆβ Frontend ββββββΆβ Health β β
β β (nself CLI) β β (Next.js) β β Checks β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Docker / Kubernetes Deployment β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
All deployment scripts are in /scripts/:
scripts/
βββ deploy-local.sh # Local development deployment
βββ deploy-staging.sh # Staging environment deployment
βββ deploy-production.sh # Production deployment
βββ health-check.sh # Full health checks
βββ rollback.sh # Rollback to previous versionAll deployment scripts support these common options:
| Option | Description | Example |
|---|---|---|
--dry-run |
Preview without executing | ./deploy-local.sh --dry-run |
--help |
Show usage information | ./deploy-staging.sh --help |
--skip-health-check |
Skip post-deployment checks | Not recommended |
--verbose |
Detailed output | ./health-check.sh --verbose |
- Docker and Docker Compose installed
- Node.js 20+ and pnpm installed
- nself CLI v1.0.9+ installed
-
.backend/directory initialized
# Full local deployment (backend + frontend)
./scripts/deploy-local.sh
# Backend only
./scripts/deploy-local.sh --backend-only
# Frontend only
./scripts/deploy-local.sh --frontend-only-
Validates environment
- Checks required tools (docker, nself, node, pnpm)
- Verifies Node.js version (β₯20)
- Checks backend directory exists
-
Deploys backend services
- Runs
nself buildto generate docker-compose.yml - Starts services with
nself start - Waits for services to initialize
- Runs
-
Deploys frontend
- Installs dependencies if needed
- Starts Next.js dev server on port 3000
- Sets development environment variables
-
Runs health checks
- Verifies backend services running
- Checks frontend accessibility
- Validates critical services (Hasura, Auth, PostgreSQL)
After successful deployment:
Backend Services:
GraphQL: http://localhost:8080/v1/graphql
Hasura: http://localhost:8080/console
Auth: http://localhost:4000
Admin: http://localhost:3021
Frontend:
Dev Server: http://localhost:3000
Development Credentials:
Email: [email protected]
Password: password123
# Preview deployment
./scripts/deploy-local.sh --dry-run
# Skip health checks (faster)
./scripts/deploy-local.sh --skip-health-check
# Custom frontend port
./scripts/deploy-local.sh --port 3001
# Backend only (for API development)
./scripts/deploy-local.sh --backend-onlyIssue: Backend services not starting
# Check Docker daemon
docker ps
# Check nself status
cd .backend && nself status
# Rebuild backend
cd .backend && nself build && nself startIssue: Port 3000 already in use
# Find process using port
lsof -ti:3000
# Kill process
kill -9 $(lsof -ti:3000)
# Or use custom port
./scripts/deploy-local.sh --port 3001Issue: Dependencies out of sync
# Clean install
rm -rf node_modules pnpm-lock.yaml
pnpm install- kubectl configured for staging cluster
- Docker registry authentication
- Build tools (Docker, Node.js, pnpm)
- Git repository in clean state
# Full staging deployment
./scripts/deploy-staging.sh
# Skip tests (faster, not recommended)
./scripts/deploy-staging.sh --skip-tests
# Specific version
./scripts/deploy-staging.sh --tag v1.0.0
# Preview deployment
./scripts/deploy-staging.sh --dry-run-
Pre-deployment validation
- Validates environment (kubectl, docker, git)
- Checks cluster connectivity
- Verifies namespace exists
- Checks for uncommitted changes
-
Runs test suite
- Unit tests with Jest
- TypeScript type checking
- ESLint linting
- Fails deployment if tests fail
-
Builds Docker image
- Builds production Docker image
- Tags with git commit SHA
- Pushes to registry
-
Saves current state
- Records current revision
- Enables rollback if deployment fails
-
Deploys application
- Updates Kubernetes deployment
- Waits for rollout completion
- Monitors pod status
-
Health checks
- Verifies all pods ready
- Checks health endpoints
- Monitors error rates
- Retries up to 5 times
-
Auto-rollback on failure
- Automatically rolls back if health checks fail
- Restores previous revision
- Verifies rollback health
Environment variables:
# Required
export KUBECONFIG=/path/to/staging-kubeconfig
export DOCKER_REGISTRY=ghcr.io
# Optional
export IMAGE_TAG=custom-tag # Override default (git SHA)
export NAMESPACE=custom-namespace # Override default# Full staging deployment with all checks
./scripts/deploy-staging.sh
# Skip tests for hotfix (use with caution)
./scripts/deploy-staging.sh --skip-tests
# Use existing build
./scripts/deploy-staging.sh --skip-build --tag abc123
# Dry run to preview
./scripts/deploy-staging.sh --dry-run
# Disable auto-rollback (not recommended)
./scripts/deploy-staging.sh --no-rollback# Watch deployment progress
kubectl rollout status deployment/nself-chat -n nself-chat-staging
# Check pod status
kubectl get pods -n nself-chat-staging -l app.kubernetes.io/name=nself-chat
# View logs
kubectl logs -f deployment/nself-chat -n nself-chat-staging
# Check events
kubectl get events -n nself-chat-staging --sort-by='.lastTimestamp'Production deployment includes maximum safety checks:
- β Mandatory version tag (no 'latest' allowed)
- β Multiple approval gates (manual confirmation required)
- β Pre-deployment validation (cluster, namespace, replicas)
- β Zero-downtime deployment (rolling update)
- β Extensive health monitoring (10 retries, 15s delay)
- β Automatic rollback (on failure)
- β Full audit logging (every action logged)
- β Smoke tests (critical endpoints)
- β Post-deployment monitoring (2 minutes stability check)
- kubectl configured for production cluster
- Production kubeconfig file
- Docker registry authentication
- Tagged release (semantic versioning: v1.0.0)
- Database backup completed
- Team approval for deployment
# Production deployment (requires approval)
./scripts/deploy-production.sh --tag v1.0.0
# Preview deployment plan
./scripts/deploy-production.sh --tag v1.0.0 --dry-run
# Canary deployment (gradual rollout)
./scripts/deploy-production.sh --tag v1.0.0 --canary-
Validates version tag
- Tag is required (no 'latest')
- Validates semantic versioning format
- Checks image exists in registry
-
Validates production environment
- Confirms production cluster connection
- Verifies namespace and deployment exist
- Checks minimum replica count (β₯2)
- Validates image in registry
-
Pre-deployment checks
- Checks cluster resources
- Verifies pod disruption budget
- Confirms all pods healthy
- Warns about active alerts
-
Approval gate
β οΈ - Displays deployment plan
- Requires approver name
- Requires typing 'deploy-production' to confirm
- Logs approver in audit log
-
Saves current state
- Records current revision
- Backs up deployment spec
- Backs up configmaps/secrets
- Enables rollback
-
Deploys application
- Updates deployment image
- Annotates with metadata (timestamp, approver, tag)
- Waits for rollout (10 minute timeout)
- Monitors pod status
-
Health checks
- Verifies all pods running and ready
- Checks pod restart counts
- Tests health endpoints
- Monitors error rates in logs
- Retries up to 10 times with 15s delay
-
Smoke tests
- Tests critical endpoints
- Verifies database connectivity
- Checks external integrations
-
Post-deployment monitoring
- Monitors for 2 minutes
- Watches for pod count drops
- Alerts on instability
-
Auto-rollback on failure
- Rolls back if health checks fail
- Restores previous revision
- Verifies rollback health
- Requires manual intervention if rollback fails
When you run a production deployment, you'll see:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRODUCTION DEPLOYMENT APPROVAL β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Environment: production
Namespace: nself-chat-production
Image: ghcr.io/nself/nself-chat:v1.0.0
Strategy: Rolling Update
Auto Rollback: Enabled
Current Image: ghcr.io/nself/nself-chat:v0.9.0
New Image: ghcr.io/nself/nself-chat:v1.0.0
Enter your name to approve deployment: John Doe
Type 'deploy-production' to confirm: deploy-production
β Deployment approved by: John Doe
# Standard production deployment
./scripts/deploy-production.sh --tag v1.0.0
# Canary deployment (10% traffic)
./scripts/deploy-production.sh --tag v1.0.0 --canary
# Canary with custom percentage
./scripts/deploy-production.sh --tag v1.0.0 --canary --canary-pct 25
# Preview deployment (no changes)
./scripts/deploy-production.sh --tag v1.0.0 --dry-run
# Skip approval (CI/CD only, NOT recommended for manual use)
./scripts/deploy-production.sh --tag v1.0.0 --skip-approvalEvery production deployment creates an audit log:
# Audit log location
/tmp/deploy-YYYYMMDD-HHMMSS.log
# View audit log
cat /tmp/deploy-20260209-143022.logLog contents include:
- Deployment ID and timestamp
- Approver name
- All validation checks
- Image tags (old and new)
- Health check results
- Rollback actions (if any)
Before deploying to production:
- All tests passing in staging
- Code reviewed and approved
- Database migrations tested
- Database backup completed
- Rollback plan documented
- Monitoring dashboards ready
- On-call engineer notified
- Deployment window scheduled
- Stakeholders informed
# Check local environment
./scripts/health-check.sh
# Check staging
./scripts/health-check.sh --env staging
# Check production
./scripts/health-check.sh --env production
# Quick check (essential services only)
./scripts/health-check.sh --quick
# Verbose output
./scripts/health-check.sh --verboseLocal Environment:
- β Backend services status (nself status)
- β PostgreSQL database running
- β Hasura GraphQL engine running
- β Authentication service running
- β Frontend dev server accessible
- β Dependencies installed (node_modules)
- β GraphQL API responding
- β Database connectivity
- β External dependencies (DNS, internet)
Staging/Production (Kubernetes):
- β Cluster connectivity
- β Namespace exists
- β Deployment exists and healthy
- β All replicas ready and available
- β No crash loops (restart count < 5)
- β Event log clean (< 10 warnings)
- β GraphQL API responding
- β External dependencies
| Code | Meaning | Action |
|---|---|---|
| 0 | All checks passed | β Everything healthy |
| 1 | Warnings present | |
| 2 | Critical failures | β Immediate action required |
Run health checks automatically:
# Cron job for staging (every 5 minutes)
*/5 * * * * /path/to/scripts/health-check.sh --env staging --quick
# Cron job for production (every minute)
* * * * * /path/to/scripts/health-check.sh --env production --quickBoth staging and production scripts include automatic rollback:
- Triggers on health check failures
- Rolls back to previous revision
- Verifies rollback health
- Logs all actions
If you need to manually rollback:
# Rollback to previous version
./scripts/rollback.sh
# Rollback to specific revision
./scripts/rollback.sh --revision 3
# Preview rollback
./scripts/rollback.sh --dry-run
# Rollback with Helm
./scripts/rollback.sh --helm# Namespace-specific rollback
./scripts/rollback.sh --namespace nself-chat-production
# Rollback without waiting
./scripts/rollback.sh --no-wait
# Show deployment history first
kubectl rollout history deployment/nself-chat -n nself-chat-productionIn case of critical production issues:
# Immediate rollback (no confirmation)
kubectl rollout undo deployment/nself-chat -n nself-chat-production
# Check status
kubectl rollout status deployment/nself-chat -n nself-chat-production
# Verify health
./scripts/health-check.sh --env productionIssue: Build fails
# Check Docker daemon
docker ps
# Clean build
docker system prune -af
./scripts/docker-build.sh --tag v1.0.0 --no-cacheIssue: Tests fail
# Run tests locally
pnpm test
# Type check
pnpm type-check
# Fix and retry
git commit -am "fix: resolve test failures"
./scripts/deploy-staging.shIssue: Image not found in registry
# Verify image exists
docker manifest inspect ghcr.io/nself/nself-chat:v1.0.0
# Build and push
./scripts/docker-build.sh --tag v1.0.0 --pushIssue: Pods not ready
# Check pod status
kubectl get pods -n nself-chat-staging
# Describe problematic pod
kubectl describe pod <pod-name> -n nself-chat-staging
# Check logs
kubectl logs <pod-name> -n nself-chat-stagingIssue: High restart count
# Check pod logs
kubectl logs <pod-name> -n nself-chat-staging --previous
# Check resource limits
kubectl describe pod <pod-name> -n nself-chat-staging | grep -A 10 Limits
# Increase resources if OOMKilled
kubectl set resources deployment/nself-chat --limits=memory=2Gi -n nself-chat-stagingIssue: Rollback fails
# Check deployment history
kubectl rollout history deployment/nself-chat -n nself-chat-production
# Restore from backup
kubectl apply -f /tmp/deployment-backup-deploy-YYYYMMDD-HHMMSS.yaml| Error | Cause | Solution |
|---|---|---|
kubectl: command not found |
kubectl not installed | Install kubectl |
Cannot connect to cluster |
KUBECONFIG not set | Set KUBECONFIG path |
Namespace not found |
Wrong namespace | Verify namespace name |
Image pull error |
Image not in registry | Build and push image |
Pods crash looping |
Application error | Check logs, fix code |
Health check timeout |
Service not responding | Check network, increase timeout |
-
Always test locally first
./scripts/deploy-local.sh ./scripts/health-check.sh
-
Keep backend running
# Don't stop/start backend frequently # Just restart frontend for code changes ./scripts/deploy-local.sh --frontend-only
-
Use dev authentication
# In .env.local NEXT_PUBLIC_USE_DEV_AUTH=true
-
Deploy every PR
- Test in staging before merging
- Run full test suite
- Verify health checks pass
-
Use realistic data
- Seed with production-like data
- Test migrations on staging first
-
Monitor closely
- Watch logs after deployment
- Check error rates
- Verify integrations work
-
Always use tagged releases
# Good ./scripts/deploy-production.sh --tag v1.0.0 # Bad ./scripts/deploy-production.sh --tag latest # Will fail
-
Deploy during low traffic
- Schedule deployments during off-peak hours
- Notify team in advance
- Have rollback plan ready
-
Monitor for 30 minutes
- Watch error rates
- Check performance metrics
- Monitor user reports
-
Never skip safety checks
# Don't do this in production ./scripts/deploy-production.sh --skip-validation --skip-health-check # These flags are for emergencies only
-
Keep audit logs
- Archive audit logs for compliance
- Review failed deployments
- Document lessons learned
-
Version everything
- Use semantic versioning
- Tag releases in git
- Document changes in CHANGELOG.md
-
Test rollback procedures
- Practice rollbacks in staging
- Verify backups work
- Time how long rollback takes
-
Automate where possible
- Use CI/CD for staging
- Require manual approval for production
- Automate health checks
-
Document everything
- Keep deployment logs
- Document incidents
- Update runbooks
name: Deploy to Staging
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Setup kubectl
uses: azure/setup-kubectl@v3
- name: Configure kubectl
run: |
echo "${{ secrets.KUBECONFIG_STAGING }}" > /tmp/kubeconfig
echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
- name: Deploy to Staging
run: ./scripts/deploy-staging.sh
env:
DOCKER_REGISTRY: ghcr.io
IMAGE_TAG: ${{ github.sha }}name: Deploy to Production
on:
workflow_dispatch:
inputs:
tag:
description: 'Version tag (e.g., v1.0.0)'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy to Production
run: ./scripts/deploy-production.sh --tag ${{ github.event.inputs.tag }}
env:
KUBECONFIG: ${{ secrets.KUBECONFIG_PRODUCTION }}
DEPLOYMENT_APPROVER: ${{ github.actor }}- Check this guide - Most common issues are documented
- Check logs - Audit logs contain detailed information
- Run health checks - Identify specific failures
- Check cluster events - Kubernetes events show what happened
- Review monitoring - Dashboards show performance metrics
For production emergencies:
- On-call engineer: Check PagerDuty
- DevOps team: #devops-alerts Slack channel
- Incident commander: Follow incident response plan
Last Updated: 2026-04-18 Version: 1.0.9 Maintained by: nself-chat DevOps Team