Queue Layer Design - tonglam/letletme_data GitHub Wiki

Overview

The Queue Layer provides job scheduling and processing capabilities for the FPL data system, handling various types of jobs from real-time updates to scheduled tasks.

System Integration Overview

graph TB
    subgraph Application Layers
        API[API Layer]
        Service[Service Layer]
        Domain[Domain Layer]
    end

    subgraph Infrastructure Layer
        subgraph Queue Infrastructure
            QA[Queue Adapter]
            WA[Worker Adapter]
            SA[Scheduler Adapter]
        end

        subgraph Error Handling
            EH[Error Handler]
            EM[Error Mapper]
            EL[Error Logger]
        end

        subgraph Redis Management
            RC[Redis Connection]
            KM[Key Manager]
            TM[TTL Manager]
        end
    end

    subgraph Queue System
        QS[Queue Service]
        WS[Worker Service]
        BQ[BullMQ]
        Redis[(Redis)]
    end

    API --> Service
    Service --> Domain
    Service --> QS
    QS --> QA
    WS --> WA
    QA & WA & SA --> BQ
    BQ --> Redis

    QA & WA & SA --> EH
    EH --> EM
    EH --> EL

    BQ --> RC
    RC --> KM
    RC --> TM

Core Components

1. Infrastructure Layer

Queue Adapter: BullMQ integration and queue operations
Worker Adapter: Job processing and worker management
Scheduler Adapter: Cron-based job scheduling
Error Handler: Comprehensive error management
Redis Manager: Key and TTL management
Cleanup Manager: Automated cleanup operations and state management
Recovery Manager: System resilience and recovery procedures

2. Queue Services

Queue Service: Job lifecycle and scheduling
Worker Service: Job execution and monitoring
Scheduler Service: Job timing and recurrence
Maintenance Service: Cleanup and recovery orchestration

System Resilience

graph TB
    subgraph Core Services
        QS[Queue Service]
        WS[Worker Service]
        MS[Maintenance Service]
    end

    subgraph Resilience Layer
        CM[Cleanup Manager]
        RM[Recovery Manager]
        HM[Health Monitor]
    end

    subgraph State Management
        JC[Job Cleanup]
        KM[Key Management]
        WC[Worker Cleanup]
    end

    QS & WS --> HM
    HM --> RM
    MS --> CM
    CM --> JC & KM & WC
    RM --> QS & WS

Cleanup Integration

Automated Cleanup
- Job state monitoring and cleanup triggers
- TTL-based key expiration
- Stale worker detection and cleanup
- Memory optimization routines
State Management
- Job state transitions with cleanup hooks
- Worker state tracking with auto-recovery
- Redis key lifecycle management

Recovery Integration

Health Monitoring
- Continuous system health checks
- Worker heartbeat monitoring
- Redis connection state tracking
- Job progress verification
Automatic Recovery
- Worker failure detection and restart
- Job state recovery and reprocessing
- Redis connection management
- System state reconciliation

Job Categories

graph TB
    subgraph Job Types
        M[Meta Jobs<br>Core Data Sync]
        L[Live Jobs<br>Real-time Updates]
        PM[Post-Match Jobs<br>Results Processing]
        PG[Post-Gameweek Jobs<br>Tournament Updates]
        D[Daily Jobs<br>Regular Updates]
    end

    subgraph Priority
        HP[High Priority]
        MP[Medium Priority]
        LP[Low Priority]
    end

    M & L --> HP
    PM & PG --> MP
    D --> LP

Job Scheduling Patterns

Meta Jobs
- Daily core data synchronization
- Early morning execution (6:35 AM UTC)
- High priority, max retries: 5
Live Jobs
- Real-time match updates
- 1-minute intervals during matches
- High priority, quick retries
Post-Match Jobs
- Result processing after matches
- Dependent on match completion
- Medium priority
Post-Gameweek Jobs
- Tournament and league updates
- After gameweek completion
- Medium priority
Daily Jobs
- Regular maintenance tasks
- Fixed daily schedule
- Lower priority

Error Handling Strategy

sequenceDiagram
    participant Job
    participant Handler
    participant Logger
    participant Retry

    Job->>Handler: Execute
    alt Success
        Handler->>Logger: Log Success
    else Error
        Handler->>Logger: Log Error
        alt Retryable
            Handler->>Retry: Apply Strategy
            Retry->>Job: Retry
        else Non-Retryable
            Handler->>Logger: Log Final Failure
        end
    end

Error Categories

Transient Errors
- Network issues
- Redis timeouts
- Strategy: Exponential backoff
Validation Errors
- Invalid job data
- Configuration issues
- Strategy: No retry
System Errors
- Resource exhaustion
- Worker crashes
- Strategy: Retry with delay

Redis Management

Key Structure

Environment-based prefixing
Category-based organization
Job-specific identifiers

TTL Strategy

Short-lived Keys
- Live match data: 1 hour
- Temporary states: 30 minutes
Medium-lived Keys
- Match results: 12 hours
- Daily updates: 24 hours
Long-lived Keys
- Historical data: 7 days
- Error logs: 30 days

Monitoring and Metrics

Key Metrics

Performance Metrics
- Job processing time
- Queue length
- Worker utilization
Error Metrics
- Failure rates
- Retry counts
- Error types distribution
System Metrics
- Redis memory usage
- Connection status
- Worker health

Implementation Guidelines

1. Functional Programming

Use TaskEither for operations
Maintain immutability
Pure function composition

2. Error Handling

Comprehensive error types
Structured logging
Retry strategies

3. Performance

Efficient Redis usage
Worker pool management
Resource optimization