🌐 Cluster Management - georgi-dev215/openvpn-web-manager GitHub Wiki
Cluster Management
Advanced multi-server cluster management for high availability and load distribution.
Overview
The Cluster Management system enables you to:
- Manage multiple OpenVPN servers from a single interface
- Distribute client load across servers
- Monitor cluster health and performance
- Implement automatic failover
- Scale your VPN infrastructure
Architecture
┌─────────────────────────────────────────────────┐ │ Master Controller │ │ ┌─────────────────────────────────────────────┐ │ │ │ Web Management Interface │ │ │ └─────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │ │ │ Server 1 │ │ Server 2 │ │ Server N │ │ │ │ (Primary) │ │ (Secondary) │ │ (Worker) │ │ │ └─────────────┘ └─────────────┘ └───────────┘ │ └─────────────────────────────────────────────────┘
Server Management
Adding Servers to Cluster
Prerequisites
# On each server node
- OpenVPN installed and configured
- SSH access enabled
- Python 3.7+ installed
- Network connectivity between servers
Adding a Server
- Navigate to Cluster → Servers
- Click Add New Server
- Configure server details:
Server Configuration:
server_id: server-01
server_name: Primary EU Server
hostname: 10.0.1.100
public_ip: 203.0.113.10
ssh_port: 22
ssh_user: admin
ssh_key_path: /path/to/private/key
openvpn_port: 1194
management_port: 7505
region: eu-west-1
capacity: 100
priority: high
- Test connection
- Initialize server configuration
- Add to cluster pool
Server Roles
Primary Server
- Role: Main configuration source
- Responsibilities:
- Certificate Authority (CA)
- Configuration templates
- Client database master
- High Availability: Can be promoted/demoted
Secondary Servers
- Role: Backup and load distribution
- Responsibilities:
- Client connection handling
- Local configuration sync
- Health monitoring
- Failover: Automatic promotion to primary
Worker Servers
- Role: Pure connection handlers
- Responsibilities:
- Client VPN connections only
- Minimal configuration
- Scale-out capacity
Server Status Monitoring
Health Checks
Health Check Parameters:
- CPU Usage: < 80%
- Memory Usage: < 85%
- Disk Space: > 15% free
- OpenVPN Service: Active
- Network Connectivity: Online
- Response Time: < 200ms
Status Indicators
- 🟢 Online: Server operational and accepting connections
- 🟡 Warning: Performance issues detected
- 🔴 Critical: Server unavailable or failing
- ⚫ Offline: Server not responding
- 🔧 Maintenance: Manually set maintenance mode
Load Balancing
Distribution Strategies
Round Robin
# Equal distribution across all available servers
client_assignment = {
'strategy': 'round_robin',
'servers': ['server-01', 'server-02', 'server-03'],
'current_index': 0
}
Weighted Distribution
# Based on server capacity and performance
server_weights = {
'server-01': 40, # High capacity
'server-02': 35, # Medium capacity
'server-03': 25 # Lower capacity
}
Geographic Distribution
# Route clients to nearest server
geo_routing = {
'europe': ['eu-server-01', 'eu-server-02'],
'america': ['us-server-01', 'us-server-02'],
'asia': ['asia-server-01']
}
Performance-Based
# Real-time server performance metrics
performance_routing = {
'metrics': ['cpu_usage', 'memory_usage', 'connection_count'],
'update_interval': 60, # seconds
'rebalance_threshold': 0.8
}
Client Assignment
Automatic Assignment
-
New Client Registration:
- Evaluate server capacity
- Apply routing strategy
- Generate server-specific configuration
- Assign client to optimal server
-
Dynamic Rebalancing:
- Monitor server loads
- Identify overloaded servers
- Migrate clients to less loaded servers
- Update client configurations
Manual Assignment
- Go to Clients → Cluster Assignment
- Select client(s)
- Choose target server
- Apply assignment
- Generate new configuration
High Availability & Failover
Automatic Failover
Failover Triggers
Failover Conditions:
- Server unresponsive for > 3 minutes
- CPU usage > 95% for > 5 minutes
- Memory usage > 95% for > 5 minutes
- OpenVPN service stopped
- Network connectivity lost
Failover Process
- Detection: Health monitoring detects failure
- Verification: Multiple checks confirm failure
- Client Migration: Active connections redirected
- Configuration Update: DNS/routing updates
- Notification: Alerts sent to administrators
Manual Failover
# Emergency failover command
./cluster-manager.py failover --from server-01 --to server-02
# Maintenance mode
./cluster-manager.py maintenance --server server-01 --enable
# Server promotion
./cluster-manager.py promote --server server-02 --to-primary
Configuration Synchronization
Configuration Management
Centralized Configuration
Synchronized Elements:
- OpenVPN server configuration
- Client certificates and keys
- Network routing rules
- Security policies
- Access control lists
Sync Process
- Master Configuration: Primary server holds authoritative config
- Change Detection: Monitor configuration changes
- Distribution: Push changes to all cluster servers
- Validation: Verify configuration integrity
- Activation: Apply changes with rollback capability
Certificate Synchronization
CA Management
# Certificate Authority sync
- Root CA certificate
- Intermediate certificates
- Client certificates
- Certificate Revocation List (CRL)
- Private keys (encrypted)
Sync Schedule
- Real-time: Critical security changes
- Scheduled: Regular configuration sync (hourly)
- On-demand: Manual administrator triggers
Monitoring & Analytics
Cluster Metrics
Performance Metrics
cluster_metrics = {
'total_servers': 5,
'online_servers': 4,
'total_clients': 150,
'active_connections': 89,
'avg_cpu_usage': 45.2,
'avg_memory_usage': 62.8,
'total_bandwidth': '2.5 Gbps',
'response_time_avg': 125 # ms
}
Server Comparison
Server | Status | Clients | CPU | Memory | Bandwidth |
---|---|---|---|---|---|
EU-01 | 🟢 Online | 25/50 | 45% | 60% | 512 Mbps |
EU-02 | 🟢 Online | 22/50 | 38% | 55% | 445 Mbps |
US-01 | 🟡 Warning | 30/50 | 78% | 82% | 678 Mbps |
US-02 | 🟢 Online | 12/50 | 25% | 35% | 234 Mbps |
Real-time Dashboard
Cluster Overview Widget
- Total servers and status distribution
- Active connections across cluster
- Aggregate performance metrics
- Recent events and alerts
Server Grid View
- Individual server status cards
- Quick actions (maintenance, restart, etc.)
- Performance sparklines
- Connection distribution
Troubleshooting
Common Issues
Server Communication Problems
# Test SSH connectivity
ssh -p 22 admin@server-ip "echo 'Connection OK'"
# Check OpenVPN management interface
telnet server-ip 7505
# Verify network routing
traceroute server-ip
Configuration Sync Failures
# Manual configuration sync
./cluster-manager.py sync --server server-01 --force
# Check sync logs
tail -f /var/log/openvpn-cluster/sync.log
# Validate configuration
./cluster-manager.py validate --server server-01
Load Balancing Issues
# Check client distribution
./cluster-manager.py status --show-distribution
# Rebalance clients manually
./cluster-manager.py rebalance --strategy performance
# View assignment history
./cluster-manager.py history --client client-name
Diagnostic Commands
# Cluster health check
./cluster-manager.py health --all
# Performance report
./cluster-manager.py report --performance --last-24h
# Connection analysis
./cluster-manager.py analyze --connections
# Export cluster configuration
./cluster-manager.py export --config --output cluster-backup.json
Best Practices
Deployment Recommendations
Server Placement
- Geographic Distribution: Servers in different regions
- Network Diversity: Multiple ISPs/data centers
- Capacity Planning: 20-30% overhead for failover
Security Considerations
- SSH Key Management: Unique keys per server
- Network Segmentation: Isolated management network
- Certificate Security: Encrypted key storage
Performance Optimization
Server Sizing
Small Deployment (< 50 clients):
- 2 vCPU, 2GB RAM
- 20GB SSD storage
- 100 Mbps network
Medium Deployment (50-200 clients):
- 4 vCPU, 4GB RAM
- 50GB SSD storage
- 500 Mbps network
Large Deployment (200+ clients):
- 8+ vCPU, 8GB+ RAM
- 100GB+ SSD storage
- 1+ Gbps network
Monitoring Intervals
- Health Checks: Every 60 seconds
- Performance Metrics: Every 5 minutes
- Configuration Sync: Every hour
- Log Rotation: Daily
Maintenance Procedures
Planned Maintenance
- Enable maintenance mode
- Drain client connections
- Perform updates/changes
- Validate functionality
- Return to service
- Monitor for issues
Emergency Procedures
- Immediate failover activation
- Client notification (if required)
- Root cause analysis
- Temporary fixes
- Permanent resolution
- Post-incident review