11.4_Evidence_Mapping - ravkorsurv/kor-ai-core GitHub Wiki

11.4 Evidence Mapping – From Raw Data to BN Input

This section describes how Kor.ai ingests raw data from multiple systems and maps it into structured evidence for Bayesian Network (BN) inference. Evidence mapping is the foundation that allows BNs to work with noisy, incomplete, or complex inputs across surveillance domains.


📂 Objectives of Evidence Mapping

  • Convert raw trade, HR, comms, and market data into BN-ready inputs
  • Support partial/incomplete evidence without breaking inference
  • Maintain auditable mappings between source fields and node IDs
  • Ensure mappings are modular and extensible for new typologies

📄 Source Domains

Source Input Fields Example Evidence Mapping
Trade Data Order Size, Type, Time Q1 = true (Large Order Submitted)
HR System Role Title, Access, Location Q2 = true (Access to Confidential Info)
News Feed Sentiment, Headline, Timestamp Q3 = "Negative" (Linked Adverse News)
Market Data Price, Volume, Volatility Q4 = "High" (Price Spike Detected)
PnL Systems Daily Return, Positioning Q5 = "Positive" (Unusual Gain)
Comms Metadata Channel, Volume, Counterparties Q6 = true (Suspicious Comms Activity)
Case History Past Dispositions Q7 = true (Prior Flag for Similar Pattern)

🥜 Transformation Pipeline

[Raw Source Data]
     ↓
[ETL Layer / API Adapter]
     ↓
[Feature Logic (e.g., if PnL > £50k → flag)]
     ↓
[Evidence Map → { Q1: true, Q4: "High", ... }]
     ↓
[Inference Engine / API Call]

Each transformation:

  • Lives in /bayesian-models/transformations/
  • Is written as a JSON or Python config
  • Links source → node ID with logic explanation

📁 File Structure

Folder/Path Contents
/transformations/ Mapping rules for each data source
/payloads/ Sample mapped evidence payloads
/validation/ Tests to verify evidence → node conformity

📥 Sample Input & Output

Raw Input (Trade + News):

{
  "order_size": 4000000,
  "instrument": "Brent Futures",
  "role": "Commodities Desk - Senior Trader",
  "news_sentiment": "negative",
  "news_time": "2025-06-04T12:00:00Z"
}

Mapped Evidence Output:

{
  "Q1": true,
  "Q2": true,
  "Q3": "Negative",
  "Q4": "Recent"
}

❓ Handling Missing or Incomplete Data

  • Nodes not present in the evidence are skipped
  • Inference still runs using priors from CPTs
  • Missing inputs are explicitly logged
  • No assumptions are made in mapping phase (no auto-filling)

🗳️ Validation and Schema Checks

  • Each dataSet is checked against the model definition before sending to Agena

  • Type mismatches or illegal values are caught early

  • Schema-driven tests ensure:

    • All required nodes are recognized
    • Value domains match node type (e.g., "High", true/false)

📎 Auditability

Every alert retains:

Field Description
Raw Input Original source data snapshot
Transformed Fields Derived values pre-mapping
Evidence Payload Final Qx node input to the BN
Score + Contributors Output from inference engine

🔮 Future Enhancements

  • Auto-generate transformation mappings from source schema
  • GUI-based transformation rule editor
  • Confidence scoring on evidence quality
  • Alert if raw input quality affects output certainty