Queue Layer Design - tonglam/letletme_data GitHub Wiki
Overview
The Queue Layer provides job scheduling and processing capabilities for the FPL data system, handling various types of jobs from real-time updates to scheduled tasks.
System Integration Overview
graph TB
subgraph Application Layers
API[API Layer]
Service[Service Layer]
Domain[Domain Layer]
end
subgraph Infrastructure Layer
subgraph Queue Infrastructure
QA[Queue Adapter]
WA[Worker Adapter]
SA[Scheduler Adapter]
end
subgraph Error Handling
EH[Error Handler]
EM[Error Mapper]
EL[Error Logger]
end
subgraph Redis Management
RC[Redis Connection]
KM[Key Manager]
TM[TTL Manager]
end
end
subgraph Queue System
QS[Queue Service]
WS[Worker Service]
BQ[BullMQ]
Redis[(Redis)]
end
API --> Service
Service --> Domain
Service --> QS
QS --> QA
WS --> WA
QA & WA & SA --> BQ
BQ --> Redis
QA & WA & SA --> EH
EH --> EM
EH --> EL
BQ --> RC
RC --> KM
RC --> TM
Core Components
1. Infrastructure Layer
- Queue Adapter: BullMQ integration and queue operations
- Worker Adapter: Job processing and worker management
- Scheduler Adapter: Cron-based job scheduling
- Error Handler: Comprehensive error management
- Redis Manager: Key and TTL management
- Cleanup Manager: Automated cleanup operations and state management
- Recovery Manager: System resilience and recovery procedures
2. Queue Services
- Queue Service: Job lifecycle and scheduling
- Worker Service: Job execution and monitoring
- Scheduler Service: Job timing and recurrence
- Maintenance Service: Cleanup and recovery orchestration
System Resilience
graph TB
subgraph Core Services
QS[Queue Service]
WS[Worker Service]
MS[Maintenance Service]
end
subgraph Resilience Layer
CM[Cleanup Manager]
RM[Recovery Manager]
HM[Health Monitor]
end
subgraph State Management
JC[Job Cleanup]
KM[Key Management]
WC[Worker Cleanup]
end
QS & WS --> HM
HM --> RM
MS --> CM
CM --> JC & KM & WC
RM --> QS & WS
Cleanup Integration
-
Automated Cleanup
- Job state monitoring and cleanup triggers
- TTL-based key expiration
- Stale worker detection and cleanup
- Memory optimization routines
-
State Management
- Job state transitions with cleanup hooks
- Worker state tracking with auto-recovery
- Redis key lifecycle management
Recovery Integration
-
Health Monitoring
- Continuous system health checks
- Worker heartbeat monitoring
- Redis connection state tracking
- Job progress verification
-
Automatic Recovery
- Worker failure detection and restart
- Job state recovery and reprocessing
- Redis connection management
- System state reconciliation
Job Categories
graph TB
subgraph Job Types
M[Meta Jobs<br>Core Data Sync]
L[Live Jobs<br>Real-time Updates]
PM[Post-Match Jobs<br>Results Processing]
PG[Post-Gameweek Jobs<br>Tournament Updates]
D[Daily Jobs<br>Regular Updates]
end
subgraph Priority
HP[High Priority]
MP[Medium Priority]
LP[Low Priority]
end
M & L --> HP
PM & PG --> MP
D --> LP
Job Scheduling Patterns
-
Meta Jobs
- Daily core data synchronization
- Early morning execution (6:35 AM UTC)
- High priority, max retries: 5
-
Live Jobs
- Real-time match updates
- 1-minute intervals during matches
- High priority, quick retries
-
Post-Match Jobs
- Result processing after matches
- Dependent on match completion
- Medium priority
-
Post-Gameweek Jobs
- Tournament and league updates
- After gameweek completion
- Medium priority
-
Daily Jobs
- Regular maintenance tasks
- Fixed daily schedule
- Lower priority
Error Handling Strategy
sequenceDiagram
participant Job
participant Handler
participant Logger
participant Retry
Job->>Handler: Execute
alt Success
Handler->>Logger: Log Success
else Error
Handler->>Logger: Log Error
alt Retryable
Handler->>Retry: Apply Strategy
Retry->>Job: Retry
else Non-Retryable
Handler->>Logger: Log Final Failure
end
end
Error Categories
-
Transient Errors
- Network issues
- Redis timeouts
- Strategy: Exponential backoff
-
Validation Errors
- Invalid job data
- Configuration issues
- Strategy: No retry
-
System Errors
- Resource exhaustion
- Worker crashes
- Strategy: Retry with delay
Redis Management
Key Structure
- Environment-based prefixing
- Category-based organization
- Job-specific identifiers
TTL Strategy
-
Short-lived Keys
- Live match data: 1 hour
- Temporary states: 30 minutes
-
Medium-lived Keys
- Match results: 12 hours
- Daily updates: 24 hours
-
Long-lived Keys
- Historical data: 7 days
- Error logs: 30 days
Monitoring and Metrics
Key Metrics
-
Performance Metrics
- Job processing time
- Queue length
- Worker utilization
-
Error Metrics
- Failure rates
- Retry counts
- Error types distribution
-
System Metrics
- Redis memory usage
- Connection status
- Worker health
Implementation Guidelines
1. Functional Programming
- Use TaskEither for operations
- Maintain immutability
- Pure function composition
2. Error Handling
- Comprehensive error types
- Structured logging
- Retry strategies
3. Performance
- Efficient Redis usage
- Worker pool management
- Resource optimization