04 03 named entity recognition - VforVitorio/F1_Strat_Manager GitHub Wiki
Named Entity Recognition
Relevant source files
- scripts/NLP_radio_processing/N04_radio_info.ipynb
- scripts/NLP_radio_processing/N05_ner_models.ipynb
- scripts/NLP_radio_processing/N06_model_merging.ipynb
- scripts/NLP_radio_processing/NLP_utils/N03_bert_sentiment.py
- scripts/NLP_radio_processing/NLP_utils/N04_radio_info.py
- scripts/NLP_radio_processing/NLP_utils/N05_ner_models.py
- scripts/NLP_radio_processing/NLP_utils/N06_model_merging.py
2. Entity Types
The F1 NER system recognizes nine domain-specific entity types, each capturing critical information for race strategy:
Entity Type | Description | Example |
---|---|---|
ACTION | Direct commands or actions | "push now", "follow my instruction" |
SITUATION | Racing context or circumstances | "Hamilton is 2 seconds behind" |
INCIDENT | Accidents or on-track events | "Ferrari in the wall" |
STRATEGY_INSTRUCTION | Strategic directives | "We're looking at Plan B" |
POSITION_CHANGE | References to overtakes or positions | "You're P4", "gaining on Verstappen" |
PIT_CALL | Specific pit stop instructions | "Box this lap" |
TRACK_CONDITION | Mentions of track state | "yellows in turn 7", "track is drying" |
TECHNICAL_ISSUE | Car-related problems | "losing grip on the rear" |
WEATHER | Weather conditions | "rain expected in 5 minutes" |
3. Technical Implementation
The NER component implements a fine-tuned BERT model customized for the F1 domain with a BIO (Beginning-Inside-Outside) tagging scheme.
3.1 BIO Tagging Approach
The NER system uses BIO tagging, a standard approach in sequence labeling:
- B-: Marks the Beginning of an entity
- I-: Marks the Inside (continuation) of an entity
- O: Marks tokens Outside any entity For example, the message "Ferrari in the wall, no? Yes, that's Charles stopped" would be tagged as: | Word | Tag | | ------- | ---------- | | Ferrari | B-INCIDENT | | in | I-INCIDENT | | the | I-INCIDENT | | wall | I-INCIDENT | | , | O | | no | O | | ? | O | | Yes | O | | , | O | | that's | B-INCIDENT | | Charles | I-INCIDENT | | stopped | I-INCIDENT |
3.2 Model Architecture
The NER system uses a BERT-based token classification model:
The technical components include:
- Base Model: BERT-large-cased-finetuned-conll03-english
- Customization: Fine-tuned on annotated F1 radio communications
- Output Layer: 19 output classes (B- and I- for each of the 9 entity types, plus O)
- Training Approach: Focused fine-tuning with class weights to handle entity imbalance
4. Data Processing Pipeline
The NER system transforms raw text into structured entity data through several processing steps:
Key steps in the process:
- Tokenization: The raw text is tokenized using BERT's WordPiece tokenizer
- Prediction: Tokens are processed through the model to predict BIO tags
- Entity Extraction: Consecutive tokens with matching entity types are combined
6. Example Output
Here's an example of a processed radio message showing the full pipeline output:
This structured output enables the expert system to understand:
- The message contains a pit stop instruction (ORDER intent)
- The emotional tone is neutral
- There are specific actions (box), pit instructions (this lap for softs), and a situational update (Hamilton is catching up)
7. From NER to Strategy Decisions
The NER-extracted entities directly inform strategic decision-making in the expert system:
Examples of how NER-extracted information influences strategy:
- PIT_CALL entities: Trigger pit stop preparation rules
- TRACK_CONDITION entities: Activate weather strategy adjustment rules
- TECHNICAL_ISSUE entities: Inform tire management or defensive driving recommendations
- POSITION_CHANGE entities: Update race situation awareness for overtake opportunities
8. Implementation Details
The F1 NER system is implemented using the following key technologies and patterns:
- Data Preparation:
- Character-span annotation to BIO tag conversion
- Custom tokenization handling for BERT model
- Model Configuration:
- BertForTokenClassification with custom classification head
- Entity-specific class weighting to handle imbalance
- Inference Functions:
analyze_f1_radio()
: Main function for entity extraction- Custom post-processing to merge BIO tags into entities
- Integration Interface:
- Module organization:
NLP_utils/N05_ner_models.py
- Integration point:
analyze_radio_message()
in N06_model_merging.py
- Module organization:
9. Usage in the Strategy System
The structured entity information from the NER component is consumed by the F1 Strategy Engine as RadioFact
objects, which trigger specific rules in the expert system:
The NER results inform strategic decisions such as:
- When to pit based on weather changes mentioned in radio