🌐 Cluster Management - georgi-dev215/openvpn-web-manager GitHub Wiki

Cluster Management

Advanced multi-server cluster management for high availability and load distribution.

Overview

The Cluster Management system enables you to:

  • Manage multiple OpenVPN servers from a single interface
  • Distribute client load across servers
  • Monitor cluster health and performance
  • Implement automatic failover
  • Scale your VPN infrastructure

Architecture

┌─────────────────────────────────────────────────┐ │ Master Controller │ │ ┌─────────────────────────────────────────────┐ │ │ │ Web Management Interface │ │ │ └─────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │ │ │ Server 1 │ │ Server 2 │ │ Server N │ │ │ │ (Primary) │ │ (Secondary) │ │ (Worker) │ │ │ └─────────────┘ └─────────────┘ └───────────┘ │ └─────────────────────────────────────────────────┘

Server Management

Adding Servers to Cluster

Prerequisites

# On each server node
- OpenVPN installed and configured
- SSH access enabled
- Python 3.7+ installed
- Network connectivity between servers

Adding a Server

  1. Navigate to ClusterServers
  2. Click Add New Server
  3. Configure server details:
Server Configuration:
  server_id: server-01
  server_name: Primary EU Server
  hostname: 10.0.1.100
  public_ip: 203.0.113.10
  ssh_port: 22
  ssh_user: admin
  ssh_key_path: /path/to/private/key
  openvpn_port: 1194
  management_port: 7505
  region: eu-west-1
  capacity: 100
  priority: high
  1. Test connection
  2. Initialize server configuration
  3. Add to cluster pool

Server Roles

Primary Server

  • Role: Main configuration source
  • Responsibilities:
    • Certificate Authority (CA)
    • Configuration templates
    • Client database master
  • High Availability: Can be promoted/demoted

Secondary Servers

  • Role: Backup and load distribution
  • Responsibilities:
    • Client connection handling
    • Local configuration sync
    • Health monitoring
  • Failover: Automatic promotion to primary

Worker Servers

  • Role: Pure connection handlers
  • Responsibilities:
    • Client VPN connections only
    • Minimal configuration
    • Scale-out capacity

Server Status Monitoring

Health Checks

Health Check Parameters:
- CPU Usage: < 80%
- Memory Usage: < 85%
- Disk Space: > 15% free
- OpenVPN Service: Active
- Network Connectivity: Online
- Response Time: < 200ms

Status Indicators

  • 🟢 Online: Server operational and accepting connections
  • 🟡 Warning: Performance issues detected
  • 🔴 Critical: Server unavailable or failing
  • Offline: Server not responding
  • 🔧 Maintenance: Manually set maintenance mode

Load Balancing

Distribution Strategies

Round Robin

# Equal distribution across all available servers
client_assignment = {
    'strategy': 'round_robin',
    'servers': ['server-01', 'server-02', 'server-03'],
    'current_index': 0
}

Weighted Distribution

# Based on server capacity and performance
server_weights = {
    'server-01': 40,  # High capacity
    'server-02': 35,  # Medium capacity  
    'server-03': 25   # Lower capacity
}

Geographic Distribution

# Route clients to nearest server
geo_routing = {
    'europe': ['eu-server-01', 'eu-server-02'],
    'america': ['us-server-01', 'us-server-02'],
    'asia': ['asia-server-01']
}

Performance-Based

# Real-time server performance metrics
performance_routing = {
    'metrics': ['cpu_usage', 'memory_usage', 'connection_count'],
    'update_interval': 60,  # seconds
    'rebalance_threshold': 0.8
}

Client Assignment

Automatic Assignment

  1. New Client Registration:

    • Evaluate server capacity
    • Apply routing strategy
    • Generate server-specific configuration
    • Assign client to optimal server
  2. Dynamic Rebalancing:

    • Monitor server loads
    • Identify overloaded servers
    • Migrate clients to less loaded servers
    • Update client configurations

Manual Assignment

  1. Go to ClientsCluster Assignment
  2. Select client(s)
  3. Choose target server
  4. Apply assignment
  5. Generate new configuration

High Availability & Failover

Automatic Failover

Failover Triggers

Failover Conditions:
  - Server unresponsive for > 3 minutes
  - CPU usage > 95% for > 5 minutes
  - Memory usage > 95% for > 5 minutes
  - OpenVPN service stopped
  - Network connectivity lost

Failover Process

  1. Detection: Health monitoring detects failure
  2. Verification: Multiple checks confirm failure
  3. Client Migration: Active connections redirected
  4. Configuration Update: DNS/routing updates
  5. Notification: Alerts sent to administrators

Manual Failover

# Emergency failover command
./cluster-manager.py failover --from server-01 --to server-02

# Maintenance mode
./cluster-manager.py maintenance --server server-01 --enable

# Server promotion
./cluster-manager.py promote --server server-02 --to-primary

Configuration Synchronization

Configuration Management

Centralized Configuration

Synchronized Elements:
  - OpenVPN server configuration
  - Client certificates and keys
  - Network routing rules
  - Security policies
  - Access control lists

Sync Process

  1. Master Configuration: Primary server holds authoritative config
  2. Change Detection: Monitor configuration changes
  3. Distribution: Push changes to all cluster servers
  4. Validation: Verify configuration integrity
  5. Activation: Apply changes with rollback capability

Certificate Synchronization

CA Management

# Certificate Authority sync
- Root CA certificate
- Intermediate certificates  
- Client certificates
- Certificate Revocation List (CRL)
- Private keys (encrypted)

Sync Schedule

  • Real-time: Critical security changes
  • Scheduled: Regular configuration sync (hourly)
  • On-demand: Manual administrator triggers

Monitoring & Analytics

Cluster Metrics

Performance Metrics

cluster_metrics = {
    'total_servers': 5,
    'online_servers': 4,
    'total_clients': 150,
    'active_connections': 89,
    'avg_cpu_usage': 45.2,
    'avg_memory_usage': 62.8,
    'total_bandwidth': '2.5 Gbps',
    'response_time_avg': 125  # ms
}

Server Comparison

Server Status Clients CPU Memory Bandwidth
EU-01 🟢 Online 25/50 45% 60% 512 Mbps
EU-02 🟢 Online 22/50 38% 55% 445 Mbps
US-01 🟡 Warning 30/50 78% 82% 678 Mbps
US-02 🟢 Online 12/50 25% 35% 234 Mbps

Real-time Dashboard

Cluster Overview Widget

  • Total servers and status distribution
  • Active connections across cluster
  • Aggregate performance metrics
  • Recent events and alerts

Server Grid View

  • Individual server status cards
  • Quick actions (maintenance, restart, etc.)
  • Performance sparklines
  • Connection distribution

Troubleshooting

Common Issues

Server Communication Problems

# Test SSH connectivity
ssh -p 22 admin@server-ip "echo 'Connection OK'"

# Check OpenVPN management interface
telnet server-ip 7505

# Verify network routing
traceroute server-ip

Configuration Sync Failures

# Manual configuration sync
./cluster-manager.py sync --server server-01 --force

# Check sync logs
tail -f /var/log/openvpn-cluster/sync.log

# Validate configuration
./cluster-manager.py validate --server server-01

Load Balancing Issues

# Check client distribution
./cluster-manager.py status --show-distribution

# Rebalance clients manually
./cluster-manager.py rebalance --strategy performance

# View assignment history
./cluster-manager.py history --client client-name

Diagnostic Commands

# Cluster health check
./cluster-manager.py health --all

# Performance report
./cluster-manager.py report --performance --last-24h

# Connection analysis
./cluster-manager.py analyze --connections

# Export cluster configuration
./cluster-manager.py export --config --output cluster-backup.json

Best Practices

Deployment Recommendations

Server Placement

  • Geographic Distribution: Servers in different regions
  • Network Diversity: Multiple ISPs/data centers
  • Capacity Planning: 20-30% overhead for failover

Security Considerations

  • SSH Key Management: Unique keys per server
  • Network Segmentation: Isolated management network
  • Certificate Security: Encrypted key storage

Performance Optimization

Server Sizing

Small Deployment (< 50 clients):
  - 2 vCPU, 2GB RAM
  - 20GB SSD storage
  - 100 Mbps network

Medium Deployment (50-200 clients):
  - 4 vCPU, 4GB RAM
  - 50GB SSD storage
  - 500 Mbps network

Large Deployment (200+ clients):
  - 8+ vCPU, 8GB+ RAM
  - 100GB+ SSD storage
  - 1+ Gbps network

Monitoring Intervals

  • Health Checks: Every 60 seconds
  • Performance Metrics: Every 5 minutes
  • Configuration Sync: Every hour
  • Log Rotation: Daily

Maintenance Procedures

Planned Maintenance

  1. Enable maintenance mode
  2. Drain client connections
  3. Perform updates/changes
  4. Validate functionality
  5. Return to service
  6. Monitor for issues

Emergency Procedures

  1. Immediate failover activation
  2. Client notification (if required)
  3. Root cause analysis
  4. Temporary fixes
  5. Permanent resolution
  6. Post-incident review