11.2_Node_Library - ravkorsurv/kor-ai-core GitHub Wiki

11.2 Node Library – Kor.ai Bayesian Model

This page outlines the standard node types used across all Bayesian Network (BN) models in Kor.ai, including reusable definitions, categories, data sources, and design conventions.


πŸ”‘ Purpose of Node Library

  • Promote consistency across models (Insider Dealing, Spoofing, etc.)
  • Enable modular reuse of core node types
  • Simplify transformations from raw data to BN input
  • Support explainability and UI-friendly mapping
  • Ensure API compatibility with Agena or pgmpy

πŸ“‚ Node Metadata Schema

Each node is defined in JSON using:

Field Description
id Obfuscated node ID (e.g. Q1, RiskNode)
label Human-readable name (for UI explanation)
type Binary, ranked, categorical, hidden
source Data pipeline input (e.g., HR, comms, trade)
used_in Models using the node (Insider, Spoofing, etc.)

🧱 Node Types

1. Binary Nodes

Used for Yes/No flags or presence/absence detection.

Example Label Input Logic
Access to Confidential Info true if trader’s HR role has access
Quote Cancellations Detected true if β‰₯3 cancels within 10 mins
Linked Negative News true if adverse news within 1 day

2. Ranked Nodes

Support low/med/high or ordinal tiers.

Example Label Possible Values Source
Price Movement Magnitude None, Low, High Market Data
PnL Contribution Negative, Flat, Positive Trade PnL feed
Information Sensitivity Internal, Restricted HR Classifier

3. Categorical Nodes

For mutually exclusive categories (e.g. region, desk).

Label Categories
Desk Type FX, Energy, Rates
Employment Status Perm, Contractor
Trade Strategy Prop, Client Facilitation

4. Latent or Hidden Nodes

Not directly observed, inferred from multiple inputs.

Label Description
Intent to Manipulate Synthesised from HR access + news + trade
Pressure to Perform Derived from role + revenue targets + losses
Opportunity Window Modelled based on news + trade timing

🧭 Node Design Principles

  • Nodes should represent causal or explainable factors.
  • Avoid overly raw inputs; instead, engineer interpretable flags.
  • Obfuscate IDs (Q1, Q2) in API/model; map to readable labels in UI.
  • CPTs default to flat priors unless SME knowledge is embedded.
  • Node combinations use Noisy-OR where causes are independent.

πŸ” Node Reuse Across Models

Node ID Label Used In
Q1 Access to Confidential Info Insider, Front-run
Q2 Price Spike Detected Insider, Spoofing
Q3 Negative News Coverage Insider
Q4 High Quote Cancel Rate Spoofing
Q5 Suspicious Order Behavior Spoofing
Q6 Intent to Manipulate (latent) All models
RiskNode Overall Abuse Risk Score All models

πŸ“¦ File Locations

All node JSON definitions live in: /bayesian-models/components/

Each file defines:

  • Node ID
  • Type
  • CPT logic (manual or template)
  • Input data expectations
  • Label mappings

🧠 Rationale for Shared Nodes

  • Simplifies model maintenance
  • Enables consistent alert logic across abuse typologies
  • Facilitates UI and audit reporting with standard mappings
  • Supports scalable CPT tuning across typologies (e.g., all models using Access to Info update together)

🚧 Notes for Expansion

  • This library will grow as typologies expand (e.g., wash trading, collusion)
  • Future: CPTs may be auto-generated from case review outcomes
  • Mapping logic (raw input β†’ node evidence) is defined per node in transformations/