11.4_Evidence_Mapping - ravkorsurv/kor-ai-core GitHub Wiki
11.4 Evidence Mapping – From Raw Data to BN Input
This section describes how Kor.ai ingests raw data from multiple systems and maps it into structured evidence for Bayesian Network (BN) inference. Evidence mapping is the foundation that allows BNs to work with noisy, incomplete, or complex inputs across surveillance domains.
📂 Objectives of Evidence Mapping
- Convert raw trade, HR, comms, and market data into BN-ready inputs
- Support partial/incomplete evidence without breaking inference
- Maintain auditable mappings between source fields and node IDs
- Ensure mappings are modular and extensible for new typologies
📄 Source Domains
Source | Input Fields | Example Evidence Mapping |
---|---|---|
Trade Data | Order Size, Type, Time | Q1 = true (Large Order Submitted) |
HR System | Role Title, Access, Location | Q2 = true (Access to Confidential Info) |
News Feed | Sentiment, Headline, Timestamp | Q3 = "Negative" (Linked Adverse News) |
Market Data | Price, Volume, Volatility | Q4 = "High" (Price Spike Detected) |
PnL Systems | Daily Return, Positioning | Q5 = "Positive" (Unusual Gain) |
Comms Metadata | Channel, Volume, Counterparties | Q6 = true (Suspicious Comms Activity) |
Case History | Past Dispositions | Q7 = true (Prior Flag for Similar Pattern) |
🥜 Transformation Pipeline
[Raw Source Data]
↓
[ETL Layer / API Adapter]
↓
[Feature Logic (e.g., if PnL > £50k → flag)]
↓
[Evidence Map → { Q1: true, Q4: "High", ... }]
↓
[Inference Engine / API Call]
Each transformation:
- Lives in
/bayesian-models/transformations/
- Is written as a JSON or Python config
- Links source → node ID with logic explanation
📁 File Structure
Folder/Path | Contents |
---|---|
/transformations/ |
Mapping rules for each data source |
/payloads/ |
Sample mapped evidence payloads |
/validation/ |
Tests to verify evidence → node conformity |
📥 Sample Input & Output
Raw Input (Trade + News):
{
"order_size": 4000000,
"instrument": "Brent Futures",
"role": "Commodities Desk - Senior Trader",
"news_sentiment": "negative",
"news_time": "2025-06-04T12:00:00Z"
}
Mapped Evidence Output:
{
"Q1": true,
"Q2": true,
"Q3": "Negative",
"Q4": "Recent"
}
❓ Handling Missing or Incomplete Data
- Nodes not present in the evidence are skipped
- Inference still runs using priors from CPTs
- Missing inputs are explicitly logged
- No assumptions are made in mapping phase (no auto-filling)
🗳️ Validation and Schema Checks
-
Each
dataSet
is checked against the model definition before sending to Agena -
Type mismatches or illegal values are caught early
-
Schema-driven tests ensure:
- All required nodes are recognized
- Value domains match node type (e.g., "High", true/false)
📎 Auditability
Every alert retains:
Field | Description |
---|---|
Raw Input | Original source data snapshot |
Transformed Fields | Derived values pre-mapping |
Evidence Payload | Final Qx node input to the BN |
Score + Contributors | Output from inference engine |
🔮 Future Enhancements
- Auto-generate transformation mappings from source schema
- GUI-based transformation rule editor
- Confidence scoring on evidence quality
- Alert if raw input quality affects output certainty