nself-chat Deployment Guide

Version: 1.0.9 Last Updated: 2026-04-18 Status: Production Ready

Overview
Deployment Scripts
Local Development Deployment
Staging Deployment
Production Deployment
Health Checks
Rollback Procedures
Troubleshooting
Best Practices

Overview

nself-chat provides deterministic deployment scripts for three environments:

Environment	Script	Purpose	Safety Level
Local	`deploy-local.sh`	Development environment	Low (fast iteration)
Staging	`deploy-staging.sh`	Pre-production testing	Medium (validation + rollback)
Production	`deploy-production.sh`	Live production	High (maximum safety)

Deployment Architecture

┌─────────────────────────────────────────────────────────────┐
│                    nself-chat Deployment                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐     ┌──────────────┐     ┌─────────────┐ │
│  │   Backend    │────▶│   Frontend   │────▶│   Health    │ │
│  │  (nself CLI) │     │  (Next.js)   │     │   Checks    │ │
│  └──────────────┘     └──────────────┘     └─────────────┘ │
│         │                     │                     │        │
│         ▼                     ▼                     ▼        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │          Docker / Kubernetes Deployment              │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Deployment Scripts

Script Locations

All deployment scripts are in /scripts/:

scripts/
├── deploy-local.sh          # Local development deployment
├── deploy-staging.sh        # Staging environment deployment
├── deploy-production.sh     # Production deployment
├── health-check.sh          # Full health checks
└── rollback.sh              # Rollback to previous version

Common Options

All deployment scripts support these common options:

Option	Description	Example
`--dry-run`	Preview without executing	`./deploy-local.sh --dry-run`
`--help`	Show usage information	`./deploy-staging.sh --help`
`--skip-health-check`	Skip post-deployment checks	Not recommended
`--verbose`	Detailed output	`./health-check.sh --verbose`

Local Development Deployment

Prerequisites

Docker and Docker Compose installed
Node.js 20+ and pnpm installed
nself CLI v1.0.9+ installed
.backend/ directory initialized

Quick Start

# Full local deployment (backend + frontend)
./scripts/deploy-local.sh

# Backend only
./scripts/deploy-local.sh --backend-only

# Frontend only
./scripts/deploy-local.sh --frontend-only

What It Does

Validates environment
- Checks required tools (docker, nself, node, pnpm)
- Verifies Node.js version (≥20)
- Checks backend directory exists
Deploys backend services
- Runs nself build to generate docker-compose.yml
- Starts services with nself start
- Waits for services to initialize
Deploys frontend
- Installs dependencies if needed
- Starts Next.js dev server on port 3000
- Sets development environment variables
Runs health checks
- Verifies backend services running
- Checks frontend accessibility
- Validates critical services (Hasura, Auth, PostgreSQL)

Service URLs

After successful deployment:

Backend Services:
  GraphQL:  http://localhost:8080/v1/graphql
  Hasura:   http://localhost:8080/console
  Auth:     http://localhost:4000
  Admin:    http://localhost:3021

Frontend:
  Dev Server:  http://localhost:3000

Development Credentials:
  Email:    [email protected]
  Password: password123

Examples

# Preview deployment
./scripts/deploy-local.sh --dry-run

# Skip health checks (faster)
./scripts/deploy-local.sh --skip-health-check

# Custom frontend port
./scripts/deploy-local.sh --port 3001

# Backend only (for API development)
./scripts/deploy-local.sh --backend-only

Troubleshooting Local Deployment

Issue: Backend services not starting

# Check Docker daemon
docker ps

# Check nself status
cd .backend && nself status

# Rebuild backend
cd .backend && nself build && nself start

Issue: Port 3000 already in use

# Find process using port
lsof -ti:3000

# Kill process
kill -9 $(lsof -ti:3000)

# Or use custom port
./scripts/deploy-local.sh --port 3001

Issue: Dependencies out of sync

# Clean install
rm -rf node_modules pnpm-lock.yaml
pnpm install

Staging Deployment

Prerequisites

kubectl configured for staging cluster
Docker registry authentication
Build tools (Docker, Node.js, pnpm)
Git repository in clean state

Quick Start

# Full staging deployment
./scripts/deploy-staging.sh

# Skip tests (faster, not recommended)
./scripts/deploy-staging.sh --skip-tests

# Specific version
./scripts/deploy-staging.sh --tag v1.0.0

# Preview deployment
./scripts/deploy-staging.sh --dry-run

What It Does

Pre-deployment validation
- Validates environment (kubectl, docker, git)
- Checks cluster connectivity
- Verifies namespace exists
- Checks for uncommitted changes
Runs test suite
- Unit tests with Jest
- TypeScript type checking
- ESLint linting
- Fails deployment if tests fail
Builds Docker image
- Builds production Docker image
- Tags with git commit SHA
- Pushes to registry
Saves current state
- Records current revision
- Enables rollback if deployment fails
Deploys application
- Updates Kubernetes deployment
- Waits for rollout completion
- Monitors pod status
Health checks
- Verifies all pods ready
- Checks health endpoints
- Monitors error rates
- Retries up to 5 times
Auto-rollback on failure
- Automatically rolls back if health checks fail
- Restores previous revision
- Verifies rollback health

Configuration

Environment variables:

# Required
export KUBECONFIG=/path/to/staging-kubeconfig
export DOCKER_REGISTRY=ghcr.io

# Optional
export IMAGE_TAG=custom-tag          # Override default (git SHA)
export NAMESPACE=custom-namespace    # Override default

Examples

# Full staging deployment with all checks
./scripts/deploy-staging.sh

# Skip tests for hotfix (use with caution)
./scripts/deploy-staging.sh --skip-tests

# Use existing build
./scripts/deploy-staging.sh --skip-build --tag abc123

# Dry run to preview
./scripts/deploy-staging.sh --dry-run

# Disable auto-rollback (not recommended)
./scripts/deploy-staging.sh --no-rollback

Monitoring Staging Deployment

# Watch deployment progress
kubectl rollout status deployment/nself-chat -n nself-chat-staging

# Check pod status
kubectl get pods -n nself-chat-staging -l app.kubernetes.io/name=nself-chat

# View logs
kubectl logs -f deployment/nself-chat -n nself-chat-staging

# Check events
kubectl get events -n nself-chat-staging --sort-by='.lastTimestamp'

Production Deployment

⚠️ Critical Safety Features

Production deployment includes maximum safety checks:

✅ Mandatory version tag (no 'latest' allowed)
✅ Multiple approval gates (manual confirmation required)
✅ Pre-deployment validation (cluster, namespace, replicas)
✅ Zero-downtime deployment (rolling update)
✅ Extensive health monitoring (10 retries, 15s delay)
✅ Automatic rollback (on failure)
✅ Full audit logging (every action logged)
✅ Smoke tests (critical endpoints)
✅ Post-deployment monitoring (2 minutes stability check)

Prerequisites

kubectl configured for production cluster
Production kubeconfig file
Docker registry authentication
Tagged release (semantic versioning: v1.0.0)
Database backup completed
Team approval for deployment

Quick Start

# Production deployment (requires approval)
./scripts/deploy-production.sh --tag v1.0.0

# Preview deployment plan
./scripts/deploy-production.sh --tag v1.0.0 --dry-run

# Canary deployment (gradual rollout)
./scripts/deploy-production.sh --tag v1.0.0 --canary

What It Does

Validates version tag
- Tag is required (no 'latest')
- Validates semantic versioning format
- Checks image exists in registry
Validates production environment
- Confirms production cluster connection
- Verifies namespace and deployment exist
- Checks minimum replica count (≥2)
- Validates image in registry
Pre-deployment checks
- Checks cluster resources
- Verifies pod disruption budget
- Confirms all pods healthy
- Warns about active alerts
Approval gate ⚠️
- Displays deployment plan
- Requires approver name
- Requires typing 'deploy-production' to confirm
- Logs approver in audit log
Saves current state
- Records current revision
- Backs up deployment spec
- Backs up configmaps/secrets
- Enables rollback
Deploys application
- Updates deployment image
- Annotates with metadata (timestamp, approver, tag)
- Waits for rollout (10 minute timeout)
- Monitors pod status
Health checks
- Verifies all pods running and ready
- Checks pod restart counts
- Tests health endpoints
- Monitors error rates in logs
- Retries up to 10 times with 15s delay
Smoke tests
- Tests critical endpoints
- Verifies database connectivity
- Checks external integrations
Post-deployment monitoring
- Monitors for 2 minutes
- Watches for pod count drops
- Alerts on instability
Auto-rollback on failure
- Rolls back if health checks fail
- Restores previous revision
- Verifies rollback health
- Requires manual intervention if rollback fails

Approval Process

When you run a production deployment, you'll see:

╔══════════════════════════════════════════════════════════╗
║              PRODUCTION DEPLOYMENT APPROVAL              ║
╚══════════════════════════════════════════════════════════╝

Environment:    production
Namespace:      nself-chat-production
Image:          ghcr.io/nself/nself-chat:v1.0.0
Strategy:       Rolling Update
Auto Rollback:  Enabled

Current Image:  ghcr.io/nself/nself-chat:v0.9.0
New Image:      ghcr.io/nself/nself-chat:v1.0.0

Enter your name to approve deployment: John Doe
Type 'deploy-production' to confirm: deploy-production

✓ Deployment approved by: John Doe

Examples

# Standard production deployment
./scripts/deploy-production.sh --tag v1.0.0

# Canary deployment (10% traffic)
./scripts/deploy-production.sh --tag v1.0.0 --canary

# Canary with custom percentage
./scripts/deploy-production.sh --tag v1.0.0 --canary --canary-pct 25

# Preview deployment (no changes)
./scripts/deploy-production.sh --tag v1.0.0 --dry-run

# Skip approval (CI/CD only, NOT recommended for manual use)
./scripts/deploy-production.sh --tag v1.0.0 --skip-approval

Audit Logging

Every production deployment creates an audit log:

# Audit log location
/tmp/deploy-YYYYMMDD-HHMMSS.log

# View audit log
cat /tmp/deploy-20260209-143022.log

Log contents include:

Deployment ID and timestamp
Approver name
All validation checks
Image tags (old and new)
Health check results
Rollback actions (if any)

Production Deployment Checklist

Before deploying to production:

Health Checks

Running Health Checks

# Check local environment
./scripts/health-check.sh

# Check staging
./scripts/health-check.sh --env staging

# Check production
./scripts/health-check.sh --env production

# Quick check (essential services only)
./scripts/health-check.sh --quick

# Verbose output
./scripts/health-check.sh --verbose

What It Checks

Local Environment:

✓ Backend services status (nself status)
✓ PostgreSQL database running
✓ Hasura GraphQL engine running
✓ Authentication service running
✓ Frontend dev server accessible
✓ Dependencies installed (node_modules)
✓ GraphQL API responding
✓ Database connectivity
✓ External dependencies (DNS, internet)

Staging/Production (Kubernetes):

✓ Cluster connectivity
✓ Namespace exists
✓ Deployment exists and healthy
✓ All replicas ready and available
✓ No crash loops (restart count < 5)
✓ Event log clean (< 10 warnings)
✓ GraphQL API responding
✓ External dependencies

Health Check Exit Codes

Code	Meaning	Action
0	All checks passed	✅ Everything healthy
1	Warnings present	⚠️ Review warnings
2	Critical failures	❌ Immediate action required

Automated Health Checks

Run health checks automatically:

# Cron job for staging (every 5 minutes)
*/5 * * * * /path/to/scripts/health-check.sh --env staging --quick

# Cron job for production (every minute)
* * * * * /path/to/scripts/health-check.sh --env production --quick

Rollback Procedures

Automatic Rollback

Both staging and production scripts include automatic rollback:

Triggers on health check failures
Rolls back to previous revision
Verifies rollback health
Logs all actions

Manual Rollback

If you need to manually rollback:

# Rollback to previous version
./scripts/rollback.sh

# Rollback to specific revision
./scripts/rollback.sh --revision 3

# Preview rollback
./scripts/rollback.sh --dry-run

# Rollback with Helm
./scripts/rollback.sh --helm

Rollback Options

# Namespace-specific rollback
./scripts/rollback.sh --namespace nself-chat-production

# Rollback without waiting
./scripts/rollback.sh --no-wait

# Show deployment history first
kubectl rollout history deployment/nself-chat -n nself-chat-production

Emergency Rollback

In case of critical production issues:

# Immediate rollback (no confirmation)
kubectl rollout undo deployment/nself-chat -n nself-chat-production

# Check status
kubectl rollout status deployment/nself-chat -n nself-chat-production

# Verify health
./scripts/health-check.sh --env production

Troubleshooting

Deployment Failures

Issue: Build fails

# Check Docker daemon
docker ps

# Clean build
docker system prune -af
./scripts/docker-build.sh --tag v1.0.0 --no-cache

Issue: Tests fail

# Run tests locally
pnpm test

# Type check
pnpm type-check

# Fix and retry
git commit -am "fix: resolve test failures"
./scripts/deploy-staging.sh

Issue: Image not found in registry

# Verify image exists
docker manifest inspect ghcr.io/nself/nself-chat:v1.0.0

# Build and push
./scripts/docker-build.sh --tag v1.0.0 --push

Health Check Failures

Issue: Pods not ready

# Check pod status
kubectl get pods -n nself-chat-staging

# Describe problematic pod
kubectl describe pod <pod-name> -n nself-chat-staging

# Check logs
kubectl logs <pod-name> -n nself-chat-staging

Issue: High restart count

# Check pod logs
kubectl logs <pod-name> -n nself-chat-staging --previous

# Check resource limits
kubectl describe pod <pod-name> -n nself-chat-staging | grep -A 10 Limits

# Increase resources if OOMKilled
kubectl set resources deployment/nself-chat --limits=memory=2Gi -n nself-chat-staging

Rollback Issues

Issue: Rollback fails

# Check deployment history
kubectl rollout history deployment/nself-chat -n nself-chat-production

# Restore from backup
kubectl apply -f /tmp/deployment-backup-deploy-YYYYMMDD-HHMMSS.yaml

Common Error Messages

Error	Cause	Solution
`kubectl: command not found`	kubectl not installed	Install kubectl
`Cannot connect to cluster`	KUBECONFIG not set	Set KUBECONFIG path
`Namespace not found`	Wrong namespace	Verify namespace name
`Image pull error`	Image not in registry	Build and push image
`Pods crash looping`	Application error	Check logs, fix code
`Health check timeout`	Service not responding	Check network, increase timeout

Best Practices

Development

Always test locally first

./scripts/deploy-local.sh
./scripts/health-check.sh

Keep backend running

# Don't stop/start backend frequently
# Just restart frontend for code changes
./scripts/deploy-local.sh --frontend-only

Use dev authentication

# In .env.local
NEXT_PUBLIC_USE_DEV_AUTH=true

Staging

Deploy every PR
- Test in staging before merging
- Run full test suite
- Verify health checks pass
Use realistic data
- Seed with production-like data
- Test migrations on staging first
Monitor closely
- Watch logs after deployment
- Check error rates
- Verify integrations work

Production

Always use tagged releases

# Good
./scripts/deploy-production.sh --tag v1.0.0

# Bad
./scripts/deploy-production.sh --tag latest  # Will fail

Deploy during low traffic
- Schedule deployments during off-peak hours
- Notify team in advance
- Have rollback plan ready
Monitor for 30 minutes
- Watch error rates
- Check performance metrics
- Monitor user reports

Never skip safety checks

# Don't do this in production
./scripts/deploy-production.sh --skip-validation --skip-health-check

# These flags are for emergencies only

Keep audit logs
- Archive audit logs for compliance
- Review failed deployments
- Document lessons learned

General

Version everything
- Use semantic versioning
- Tag releases in git
- Document changes in CHANGELOG.md
Test rollback procedures
- Practice rollbacks in staging
- Verify backups work
- Time how long rollback takes
Automate where possible
- Use CI/CD for staging
- Require manual approval for production
- Automate health checks
Document everything
- Keep deployment logs
- Document incidents
- Update runbooks

CI/CD Integration

GitHub Actions Example

name: Deploy to Staging

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Setup kubectl
        uses: azure/setup-kubectl@v3

      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBECONFIG_STAGING }}" > /tmp/kubeconfig
          echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV

      - name: Deploy to Staging
        run: ./scripts/deploy-staging.sh
        env:
          DOCKER_REGISTRY: ghcr.io
          IMAGE_TAG: ${{ github.sha }}

Production Deployment (Manual)

name: Deploy to Production

on:
  workflow_dispatch:
    inputs:
      tag:
        description: 'Version tag (e.g., v1.0.0)'
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Production
        run: ./scripts/deploy-production.sh --tag ${{ github.event.inputs.tag }}
        env:
          KUBECONFIG: ${{ secrets.KUBECONFIG_PRODUCTION }}
          DEPLOYMENT_APPROVER: ${{ github.actor }}

Support

Getting Help

Check this guide - Most common issues are documented
Check logs - Audit logs contain detailed information
Run health checks - Identify specific failures
Check cluster events - Kubernetes events show what happened
Review monitoring - Dashboards show performance metrics

Emergency Contacts

For production emergencies:

On-call engineer: Check PagerDuty
DevOps team: #devops-alerts Slack channel
Incident commander: Follow incident response plan

Last Updated: 2026-04-18 Version: 1.0.9 Maintained by: nself-chat DevOps Team

DEPLOYMENT GUIDE - nself-org/nchat GitHub Wiki

nself-chat Deployment Guide

Table of Contents

Overview

Deployment Architecture

Deployment Scripts

Script Locations

Common Options

Local Development Deployment

Prerequisites

Quick Start

What It Does

Service URLs

Examples

Troubleshooting Local Deployment

Staging Deployment

Prerequisites

Quick Start

What It Does

Configuration

Examples

Monitoring Staging Deployment

Production Deployment

⚠️ Critical Safety Features

Prerequisites

Quick Start

What It Does

Approval Process

Examples

Audit Logging

Production Deployment Checklist

Health Checks

Running Health Checks

What It Checks

Health Check Exit Codes

Automated Health Checks

Rollback Procedures

Automatic Rollback

Manual Rollback

Rollback Options

Emergency Rollback

Troubleshooting

Deployment Failures

Health Check Failures

Rollback Issues

Common Error Messages

Best Practices

Development

Staging

Production

General

CI/CD Integration

GitHub Actions Example

Production Deployment (Manual)

Support

Getting Help

Emergency Contacts

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️