11.4_Evidence_Mapping - ravkorsurv/kor-ai-core GitHub Wiki

11.4 Evidence Mapping – From Raw Data to BN Input

This section describes how Kor.ai ingests raw data from multiple systems and maps it into structured evidence for Bayesian Network (BN) inference. Evidence mapping is the foundation that allows BNs to work with noisy, incomplete, or complex inputs across surveillance domains.

📂 Objectives of Evidence Mapping

Convert raw trade, HR, comms, and market data into BN-ready inputs
Support partial/incomplete evidence without breaking inference
Maintain auditable mappings between source fields and node IDs
Ensure mappings are modular and extensible for new typologies

📄 Source Domains

Source	Input Fields	Example Evidence Mapping
Trade Data	Order Size, Type, Time	`Q1 = true` (Large Order Submitted)
HR System	Role Title, Access, Location	`Q2 = true` (Access to Confidential Info)
News Feed	Sentiment, Headline, Timestamp	`Q3 = "Negative"` (Linked Adverse News)
Market Data	Price, Volume, Volatility	`Q4 = "High"` (Price Spike Detected)
PnL Systems	Daily Return, Positioning	`Q5 = "Positive"` (Unusual Gain)
Comms Metadata	Channel, Volume, Counterparties	`Q6 = true` (Suspicious Comms Activity)
Case History	Past Dispositions	`Q7 = true` (Prior Flag for Similar Pattern)

🥜 Transformation Pipeline

[Raw Source Data]
     ↓
[ETL Layer / API Adapter]
     ↓
[Feature Logic (e.g., if PnL > £50k → flag)]
     ↓
[Evidence Map → { Q1: true, Q4: "High", ... }]
     ↓
[Inference Engine / API Call]

Each transformation:

Lives in /bayesian-models/transformations/
Is written as a JSON or Python config
Links source → node ID with logic explanation

📁 File Structure

Folder/Path	Contents
`/transformations/`	Mapping rules for each data source
`/payloads/`	Sample mapped evidence payloads
`/validation/`	Tests to verify evidence → node conformity

📥 Sample Input & Output

Raw Input (Trade + News):

{
  "order_size": 4000000,
  "instrument": "Brent Futures",
  "role": "Commodities Desk - Senior Trader",
  "news_sentiment": "negative",
  "news_time": "2025-06-04T12:00:00Z"
}

Mapped Evidence Output:

{
  "Q1": true,
  "Q2": true,
  "Q3": "Negative",
  "Q4": "Recent"
}

❓ Handling Missing or Incomplete Data

Nodes not present in the evidence are skipped
Inference still runs using priors from CPTs
Missing inputs are explicitly logged
No assumptions are made in mapping phase (no auto-filling)

🗳️ Validation and Schema Checks

Each dataSet is checked against the model definition before sending to Agena
Type mismatches or illegal values are caught early
Schema-driven tests ensure:
- All required nodes are recognized
- Value domains match node type (e.g., "High", true/false)

📎 Auditability

Every alert retains:

Field	Description
Raw Input	Original source data snapshot
Transformed Fields	Derived values pre-mapping
Evidence Payload	Final `Qx` node input to the BN
Score + Contributors	Output from inference engine

🔮 Future Enhancements

Auto-generate transformation mappings from source schema
GUI-based transformation rule editor
Confidence scoring on evidence quality
Alert if raw input quality affects output certainty