04 03 named entity recognition - VforVitorio/F1_Strat_Manager GitHub Wiki

Named Entity Recognition

Relevant source files

2. Entity Types

The F1 NER system recognizes nine domain-specific entity types, each capturing critical information for race strategy:

Entity Type Description Example
ACTION Direct commands or actions "push now", "follow my instruction"
SITUATION Racing context or circumstances "Hamilton is 2 seconds behind"
INCIDENT Accidents or on-track events "Ferrari in the wall"
STRATEGY_INSTRUCTION Strategic directives "We're looking at Plan B"
POSITION_CHANGE References to overtakes or positions "You're P4", "gaining on Verstappen"
PIT_CALL Specific pit stop instructions "Box this lap"
TRACK_CONDITION Mentions of track state "yellows in turn 7", "track is drying"
TECHNICAL_ISSUE Car-related problems "losing grip on the rear"
WEATHER Weather conditions "rain expected in 5 minutes"

3. Technical Implementation

The NER component implements a fine-tuned BERT model customized for the F1 domain with a BIO (Beginning-Inside-Outside) tagging scheme.

3.1 BIO Tagging Approach

The NER system uses BIO tagging, a standard approach in sequence labeling:

  • B-: Marks the Beginning of an entity
  • I-: Marks the Inside (continuation) of an entity
  • O: Marks tokens Outside any entity For example, the message "Ferrari in the wall, no? Yes, that's Charles stopped" would be tagged as: | Word | Tag | | ------- | ---------- | | Ferrari | B-INCIDENT | | in | I-INCIDENT | | the | I-INCIDENT | | wall | I-INCIDENT | | , | O | | no | O | | ? | O | | Yes | O | | , | O | | that's | B-INCIDENT | | Charles | I-INCIDENT | | stopped | I-INCIDENT |

3.2 Model Architecture

The NER system uses a BERT-based token classification model:

The technical components include:

  1. Base Model: BERT-large-cased-finetuned-conll03-english
  2. Customization: Fine-tuned on annotated F1 radio communications
  3. Output Layer: 19 output classes (B- and I- for each of the 9 entity types, plus O)
  4. Training Approach: Focused fine-tuning with class weights to handle entity imbalance

4. Data Processing Pipeline

The NER system transforms raw text into structured entity data through several processing steps:

Key steps in the process:

  1. Tokenization: The raw text is tokenized using BERT's WordPiece tokenizer
  2. Prediction: Tokens are processed through the model to predict BIO tags
  3. Entity Extraction: Consecutive tokens with matching entity types are combined

6. Example Output

Here's an example of a processed radio message showing the full pipeline output:

This structured output enables the expert system to understand:

  • The message contains a pit stop instruction (ORDER intent)
  • The emotional tone is neutral
  • There are specific actions (box), pit instructions (this lap for softs), and a situational update (Hamilton is catching up)

7. From NER to Strategy Decisions

The NER-extracted entities directly inform strategic decision-making in the expert system:

Examples of how NER-extracted information influences strategy:

  • PIT_CALL entities: Trigger pit stop preparation rules
  • TRACK_CONDITION entities: Activate weather strategy adjustment rules
  • TECHNICAL_ISSUE entities: Inform tire management or defensive driving recommendations
  • POSITION_CHANGE entities: Update race situation awareness for overtake opportunities

8. Implementation Details

The F1 NER system is implemented using the following key technologies and patterns:

  1. Data Preparation:
    • Character-span annotation to BIO tag conversion
    • Custom tokenization handling for BERT model
  2. Model Configuration:
    • BertForTokenClassification with custom classification head
    • Entity-specific class weighting to handle imbalance
  3. Inference Functions:
    • analyze_f1_radio(): Main function for entity extraction
    • Custom post-processing to merge BIO tags into entities
  4. Integration Interface:
    • Module organization: NLP_utils/N05_ner_models.py
    • Integration point: analyze_radio_message() in N06_model_merging.py

9. Usage in the Strategy System

The structured entity information from the NER component is consumed by the F1 Strategy Engine as RadioFact objects, which trigger specific rules in the expert system:

The NER results inform strategic decisions such as:

  • When to pit based on weather changes mentioned in radio