named entity recognition - VforVitorio/F1_Strat_Manager GitHub Wiki

Named Entity Recognition

VforVitorio/F1_Strat_Manager

powered by

Devin

Named Entity Recognition

This document details the Named Entity Recognition (NER) component of the F1 Strategy Manager's Natural Language Processing pipeline. The NER system extracts structured information from team radio communications, identifying key racing concepts such as actions, track conditions, and strategic instructions. This structured data enables the expert system to make informed strategy decisions based on radio communications.

For information about the overall NLP pipeline, see NLP Pipeline, and for details on other components like sentiment analysis and intent classification, see Sentiment and Intent Analysis.

1. Overview and Purpose

The Named Entity Recognition system identifies and extracts domain-specific entities from Formula 1 team radio messages. While traditional NER systems focus on general entities like people and locations, our custom F1 NER system recognizes racing-specific concepts such as track conditions, pit call instructions, and technical issues.

2. Entity Types

The F1 NER system recognizes nine domain-specific entity types, each capturing critical information for race strategy:

Entity Type Description Example
ACTION Direct commands or actions "push now", "follow my instruction"
SITUATION Racing context or circumstances "Hamilton is 2 seconds behind"
INCIDENT Accidents or on-track events "Ferrari in the wall"
STRATEGY_INSTRUCTION Strategic directives "We're looking at Plan B"
POSITION_CHANGE References to overtakes or positions "You're P4", "gaining on Verstappen"
PIT_CALL Specific pit stop instructions "Box this lap"
TRACK_CONDITION Mentions of track state "yellows in turn 7", "track is drying"
TECHNICAL_ISSUE Car-related problems "losing grip on the rear"
WEATHER Weather conditions "rain expected in 5 minutes"

3. Technical Implementation

The NER component implements a fine-tuned BERT model customized for the F1 domain with a BIO (Beginning-Inside-Outside) tagging scheme.

3.1 BIO Tagging Approach

The NER system uses BIO tagging, a standard approach in sequence labeling:

  • B-: Marks the Beginning of an entity
  • I-: Marks the Inside (continuation) of an entity
  • O: Marks tokens Outside any entity

For example, the message "Ferrari in the wall, no? Yes, that's Charles stopped" would be tagged as:

Word Tag
Ferrari B-INCIDENT
in I-INCIDENT
the I-INCIDENT
wall I-INCIDENT
, O
no O
? O
Yes O
, O
that's B-INCIDENT
Charles I-INCIDENT
stopped I-INCIDENT

3.2 Model Architecture

The NER system uses a BERT-based token classification model:

The technical components include:

  1. Base Model: BERT-large-cased-finetuned-conll03-english
  2. Customization: Fine-tuned on annotated F1 radio communications
  3. Output Layer: 19 output classes (B- and I- for each of the 9 entity types, plus O)
  4. Training Approach: Focused fine-tuning with class weights to handle entity imbalance

4. Data Processing Pipeline

The NER system transforms raw text into structured entity data through several processing steps:

Key steps in the process:

  1. Tokenization: The raw text is tokenized using BERT's WordPiece tokenizer
  2. Prediction: Tokens are processed through the model to predict BIO tags
  3. Entity Extraction: Consecutive tokens with matching entity types are combined
  4. Structured Output: Entities are formatted into a JSON structure for the expert system

5. Integration with NLP Pipeline

The NER component is integrated with sentiment analysis and intent classification to provide comprehensive understanding of radio messages:

The pipeline outputs a standardized JSON format including:

  • Original message text
  • Sentiment classification (positive, negative, neutral)
  • Intent classification (ORDER, INFORMATION, QUESTION, etc.)
  • Extracted entities with their types

6. Example Output

Here's an example of a processed radio message showing the full pipeline output:

This structured output enables the expert system to understand:

  • The message contains a pit stop instruction (ORDER intent)
  • The emotional tone is neutral
  • There are specific actions (box), pit instructions (this lap for softs), and a situational update (Hamilton is catching up)

7. From NER to Strategy Decisions

The NER-extracted entities directly inform strategic decision-making in the expert system:

Examples of how NER-extracted information influences strategy:

  • PIT_CALL entities: Trigger pit stop preparation rules
  • TRACK_CONDITION entities: Activate weather strategy adjustment rules
  • TECHNICAL_ISSUE entities: Inform tire management or defensive driving recommendations
  • POSITION_CHANGE entities: Update race situation awareness for overtake opportunities

8. Implementation Details

The F1 NER system is implemented using the following key technologies and patterns:

  1. Data Preparation:

    • Character-span annotation to BIO tag conversion
    • Custom tokenization handling for BERT model
  2. Model Configuration:

    • BertForTokenClassification with custom classification head
    • Entity-specific class weighting to handle imbalance
  3. Inference Functions:

    • analyze_f1_radio(): Main function for entity extraction
    • Custom post-processing to merge BIO tags into entities
  4. Integration Interface:

    • Module organization: NLP_utils/N05_ner_models.py
    • Integration point: analyze_radio_message() in N06_model_merging.py

9. Usage in the Strategy System

The structured entity information from the NER component is consumed by the F1 Strategy Engine as RadioFact objects, which trigger specific rules in the expert system:

The NER results inform strategic decisions such as:

  • When to pit based on weather changes mentioned in radio
  • How to respond to technical issues reported by the driver
  • Adjusting race tactics based on incidents or track conditions
  • Monitoring competitor strategies mentioned in radio communications

On this page