09_Data_Model_Mapping - ravkorsurv/kor-ai-core GitHub Wiki

Data Model Mapping – Kor.ai

This document outlines how raw surveillance inputs are transformed into obfuscated Bayesian node values (e.g., Q1, Q2). It includes mappings per node, logic type (direct/derived), and upstream systems involved.


📥 Data Sources

Source Description
OMS/EMS Order and trade execution data
Market Data Feed Price, volume, volatility
Comms Archive Voice, email, chat records
HR Systems Employee role, history, red flags
KYC Systems Risk profiles, access levels
Case History Previous alerts + resolution paths
News/Events Feed Scheduled announcements, news

🧠 Bayesian Node Mapping

Node ID Meaning (Internal) Raw Field(s) Mapping Type Source System
Q1 Trade size vs avg trade_notional, avg_notional Derived OMS/EMS
Q2 Price move before news price_delta, news_timestamp Derived Market, News
Q3 Insider comms before trade email_text, call_transcript NLP Scored Comms Archive
Q4 Historical suspicious activity case_flag Direct Case DB
Q5 Clustering of trades before event trade_timestamps Pattern Scored OMS
Q6 Repeated profit-taking patterns PnL, symbol, direction Derived HR / OMS
Q7 KYC red flag kyc_score, risk_level Direct KYC System
Q8 Access to sensitive info employee_role, data_access Direct HR / Entitlements
Q9 Unusual timing vs market event exec_time, event_calendar Derived OMS + News
Q10 Insider Dealing (final outcome) [Inferred from above] Inferred Model Output

🧪 Example: Transformation Flow

// Raw input
{
  "employee_role": "Head of Trading",
  "kyc_score": 92,
  "trade_notional": 3000000,
  "avg_notional": 550000,
  "call_transcript": "Yeah, after the CEO call, let's go heavy Brent",
  "news_timestamp": "2025-06-02T09:30Z",
  "exec_time": "2025-06-02T08:22Z"
}
{
  "Q1": "High",
  "Q2": "Yes",
  "Q3": "Yes",
  "Q7": "Yes",
  "Q8": "Yes"
}

---

Let me know when that’s saved — then we’ll jump to Step 10: `10_pgmpy_Migration_Plan.md`.