11.0_Bayesian_Model_Principles - ravkorsurv/kor-ai-core GitHub Wiki

Bayesian Model Principles – Kor.ai Surveillance

This page documents the core Bayesian Network concepts that guide the Kor.ai risk scoring models. It draws directly from academic theory (esp. Darwiche, Chs. 7–10) and compares each concept with its application in Kor.ai’s insider dealing and spoofing use cases.

📚 Key Concepts from Darwiche (Chapters 7–10)

Principle	Description
Explaining Away	If two causes explain an effect, observing one reduces belief in the other
Conditional Independence	Nodes are independent unless connected by a direct/indirect path
Bayesian Inference	Posterior belief updated using new evidence
Noisy-OR / Noisy-MAX	Approximation technique for multiple causal parents
CPT Compression	Large CPTs can be compacted using logic patterns
Abduction (Backwards Reasoning)	Reasoning from effect to most likely cause
Inference by Enumeration	Exhaustive approach to marginalisation (used by pgmpy under the hood)

🧠 Kor.ai Model Design – Concept Mapping

Concept	Kor.ai Implementation	Notes
Explaining Away	✅ Q1 (size), Q2 (price), Q3 (comms) reduce belief in alternate causes	Active in insider model
Conditional Independence	✅ Clean separation unless probabilistic tie defined	CPTs manually control scope
Prior Probabilities	✅ All nodes have seeded priors	Tuned manually
Noisy-OR	❌ Not yet applied	Optional v2 enhancement
Abduction	✅ Analysts interpret from high posterior cause nodes	Supports explainability
Dynamic Bayesian Networks	❌ Not supported (static only)	MVP constraint
CPT Compression	❌ Full CPTs used, compression not implemented	May reduce inference time in future

🧪 Insider Dealing – Application of “Explaining Away”

Example:

If we observe:
- Q1 = High (large trade)
- Q2 = Yes (price spike before news)
- Q3 = Yes (chat with insider)
Then:

"These jointly explain Q10 = Insider Dealing" and reduce the need for additional evidence from Q6 or Q7.

🔄 Implications for Surveillance Library

✅ Node behaviors match theoretical causality
🔧 Future nodes (e.g., Q11 = profit share, Q12 = access logs) should maintain independence unless justified
❗ Watch for false causality links during new model builds

🧭 Next Enhancements

Add Noisy-OR logic to reduce CPT burden
Move to pgmpy with full control over graph topology + inference engine
Consider dynamic or sequential models for time-window-based behavior (e.g., spoofing patterns)

Maintainer: @ravkorsurv
Source: Modeling and Reasoning with Bayesian Networks, Darwiche, Ch. 7–10
Kor.ai models: InsiderModel.json, SpoofingModel.json