09_Data_Model_Mapping - ravkorsurv/kor-ai-core GitHub Wiki

Data Model Mapping – Kor.ai

This document outlines how raw surveillance inputs are transformed into obfuscated Bayesian node values (e.g., Q1, Q2). It includes mappings per node, logic type (direct/derived), and upstream systems involved.

📥 Data Sources

Source	Description
OMS/EMS	Order and trade execution data
Market Data Feed	Price, volume, volatility
Comms Archive	Voice, email, chat records
HR Systems	Employee role, history, red flags
KYC Systems	Risk profiles, access levels
Case History	Previous alerts + resolution paths
News/Events Feed	Scheduled announcements, news

🧠 Bayesian Node Mapping

Node ID	Meaning (Internal)	Raw Field(s)	Mapping Type	Source System
Q1	Trade size vs avg	`trade_notional`, `avg_notional`	Derived	OMS/EMS
Q2	Price move before news	`price_delta`, `news_timestamp`	Derived	Market, News
Q3	Insider comms before trade	`email_text`, `call_transcript`	NLP Scored	Comms Archive
Q4	Historical suspicious activity	`case_flag`	Direct	Case DB
Q5	Clustering of trades before event	`trade_timestamps`	Pattern Scored	OMS
Q6	Repeated profit-taking patterns	`PnL`, `symbol`, `direction`	Derived	HR / OMS
Q7	KYC red flag	`kyc_score`, `risk_level`	Direct	KYC System
Q8	Access to sensitive info	`employee_role`, `data_access`	Direct	HR / Entitlements
Q9	Unusual timing vs market event	`exec_time`, `event_calendar`	Derived	OMS + News
Q10	Insider Dealing (final outcome)	[Inferred from above]	Inferred	Model Output

🧪 Example: Transformation Flow

// Raw input
{
  "employee_role": "Head of Trading",
  "kyc_score": 92,
  "trade_notional": 3000000,
  "avg_notional": 550000,
  "call_transcript": "Yeah, after the CEO call, let's go heavy Brent",
  "news_timestamp": "2025-06-02T09:30Z",
  "exec_time": "2025-06-02T08:22Z"
}
{
  "Q1": "High",
  "Q2": "Yes",
  "Q3": "Yes",
  "Q7": "Yes",
  "Q8": "Yes"
}

---

Let me know when that’s saved — then we’ll jump to Step 10: `10_pgmpy_Migration_Plan.md`.