11.2_Node_Library - ravkorsurv/kor-ai-core GitHub Wiki
11.2 Node Library β Kor.ai Bayesian Model
This page outlines the standard node types used across all Bayesian Network (BN) models in Kor.ai, including reusable definitions, categories, data sources, and design conventions.
π Purpose of Node Library
- Promote consistency across models (Insider Dealing, Spoofing, etc.)
- Enable modular reuse of core node types
- Simplify transformations from raw data to BN input
- Support explainability and UI-friendly mapping
- Ensure API compatibility with Agena or pgmpy
π Node Metadata Schema
Each node is defined in JSON using:
Field | Description |
---|---|
id |
Obfuscated node ID (e.g. Q1 , RiskNode ) |
label |
Human-readable name (for UI explanation) |
type |
Binary, ranked, categorical, hidden |
source |
Data pipeline input (e.g., HR, comms, trade) |
used_in |
Models using the node (Insider, Spoofing, etc.) |
π§± Node Types
1. Binary Nodes
Used for Yes/No flags or presence/absence detection.
Example Label | Input Logic |
---|---|
Access to Confidential Info | true if traderβs HR role has access |
Quote Cancellations Detected | true if β₯3 cancels within 10 mins |
Linked Negative News | true if adverse news within 1 day |
2. Ranked Nodes
Support low/med/high or ordinal tiers.
Example Label | Possible Values | Source |
---|---|---|
Price Movement Magnitude | None, Low, High | Market Data |
PnL Contribution | Negative, Flat, Positive | Trade PnL feed |
Information Sensitivity | Internal, Restricted | HR Classifier |
3. Categorical Nodes
For mutually exclusive categories (e.g. region, desk).
Label | Categories |
---|---|
Desk Type | FX, Energy, Rates |
Employment Status | Perm, Contractor |
Trade Strategy | Prop, Client Facilitation |
4. Latent or Hidden Nodes
Not directly observed, inferred from multiple inputs.
Label | Description |
---|---|
Intent to Manipulate | Synthesised from HR access + news + trade |
Pressure to Perform | Derived from role + revenue targets + losses |
Opportunity Window | Modelled based on news + trade timing |
π§ Node Design Principles
- Nodes should represent causal or explainable factors.
- Avoid overly raw inputs; instead, engineer interpretable flags.
- Obfuscate IDs (Q1, Q2) in API/model; map to readable labels in UI.
- CPTs default to flat priors unless SME knowledge is embedded.
- Node combinations use Noisy-OR where causes are independent.
π Node Reuse Across Models
Node ID | Label | Used In |
---|---|---|
Q1 | Access to Confidential Info | Insider, Front-run |
Q2 | Price Spike Detected | Insider, Spoofing |
Q3 | Negative News Coverage | Insider |
Q4 | High Quote Cancel Rate | Spoofing |
Q5 | Suspicious Order Behavior | Spoofing |
Q6 | Intent to Manipulate (latent) | All models |
RiskNode | Overall Abuse Risk Score | All models |
π¦ File Locations
All node JSON definitions live in: /bayesian-models/components/
Each file defines:
- Node ID
- Type
- CPT logic (manual or template)
- Input data expectations
- Label mappings
π§ Rationale for Shared Nodes
- Simplifies model maintenance
- Enables consistent alert logic across abuse typologies
- Facilitates UI and audit reporting with standard mappings
- Supports scalable CPT tuning across typologies (e.g., all models using
Access to Info
update together)
π§ Notes for Expansion
- This library will grow as typologies expand (e.g., wash trading, collusion)
- Future: CPTs may be auto-generated from case review outcomes
- Mapping logic (raw input β node evidence) is defined per node in
transformations/