03 01 lap time prediction - VforVitorio/F1_Strat_Manager GitHub Wiki
Lap Time Prediction
Relevant source files
- scripts/ML_tyre_pred/ML_utils/N00_model_lap_prediction.py
- scripts/ML_tyre_pred/N00_model_lap_prediction.ipynb
- scripts/NLP_radio_processing/NLP_utils/N03_bert_sentiment.py
- scripts/NLP_radio_processing/NLP_utils/N04_radio_info.py
- scripts/NLP_radio_processing/NLP_utils/N05_ner_models.py
- scripts/NLP_radio_processing/NLP_utils/N06_model_merging.py
- scripts/lap_prediction.ipynb
Data Pipeline
Data Sources
The prediction system relies on multiple data sources to build a comprehensive view of race conditions:
Data Source | Description | Key Fields |
---|---|---|
Laps | Individual lap telemetry | LapTime, Compound, TyreLife, Speed measurements |
Weather | Environmental conditions | AirTemp, TrackTemp, Humidity, WindSpeed |
Intervals | Gap information between cars | gap_to_leader, interval_in_seconds |
Pitstops | Pit stop information | PitInTime, PitOutTime, Compound change |
The system primarily focuses on lap-specific data including tire compounds, tire age, sector times, and speed measurements at various points on the track. |
Data Validation and Processing
Before prediction can occur, the system validates input data through the following steps:
- Input Validation: Checks for required columns and correct data types
- Data Type Conversion: Ensures numerical values are properly formatted
- Missing Value Handling: Adds placeholders or calculates values for missing data points
- Sequential Feature Creation: Generates features that capture time-series relationships
Model Architecture
XGBoost Model
The lap time prediction system utilizes an XGBoost regression model, which was selected for its:
- High accuracy on time-series data
- Robustness to outliers
- Ability to capture non-linear relationships
- Fast prediction speed for real-time strategy decisions The model is trained on historical race data and achieves a Mean Absolute Error (MAE) of approximately 0.09 seconds, making it reliable for strategic decision-making.
Feature Importance
The model relies on several key feature types:
- Current State Features:
- Tire compound (SOFT, MEDIUM, HARD, etc.)
- Tire age (number of laps)
- Current position in race
- Speed measurements (SpeedI1, SpeedI2, SpeedFL, SpeedST)
- Sequential Features:
- Previous lap time
- Speed deltas between consecutive laps
- Lap time trends
- External Factors:
- Track status
- Team/driver identifier
Prediction Process
Step 1: Model Loading
The prediction process begins by loading the pre-trained XGBoost model:
Step 2: Prediction Pipeline
The complete prediction pipeline consists of the following steps:
- Load model - Retrieves the trained XGBoost model
- Validate data - Ensures input data meets requirements
- Add sequential features - Creates time-series based features
- Prepare features - Aligns input with model expectations
- Make predictions - Generates lap time estimates
- Format results - Structures output for consumption by other systems
Step 3: Next Lap Prediction
A key feature of the system is its ability to predict the next lap time based on the current state:
- The system identifies the last lap for each driver
- Creates a synthetic next lap entry with incremented tire age
- Predicts the lap time for this future state This next lap prediction is crucial for real-time strategy decisions during a race.
Prediction Function API
The system exposes a primary function for generating predictions:
This function accepts telemetry data as input and returns structured predictions that can be used by other system components.
Performance and Limitations
Model Accuracy
The XGBoost model achieves strong performance metrics:
- Mean Absolute Error (MAE): 0.09 seconds
- Handles various tire compounds and track conditions reliably