11.8 BN Scalability Lessons – Wu et al. 2018 - ravkorsurv/kor-ai-core GitHub Wiki

12.3 BN Scalability Lessons – Wu et al. 2018

This page summarizes key insights from Wu et al. (2018) on Dynamic Bayesian Networks (DBNs) for spoofing detection, and outlines how Kor.ai can apply these principles when designing Bayesian models for market abuse risk.

⚠️ Scalability Concerns in Wu et al. (2018)

Concern Description
High Dimensionality Modeling raw trade/order data at high frequency leads to explosive node counts across time slices.
Large CPTs Nodes with many parents produce large Conditional Probability Tables (CPTs), which are computationally expensive.
Real-Time Constraints Exact inference in large DBNs is too slow for sub-second alerting.
Overfitting with Sparse Data Few labeled abuse cases make parameter learning unstable.
Temporal Slice Depth Longer temporal horizons multiply nodes; must limit time-slice history.

✅ Kor.ai Design Principles Based on Wu et al.

1. Event Abstraction Instead of Raw Tick Data

  • Avoid modeling every millisecond/tick event.

  • Create behavioral abstraction nodes, e.g.:

    • AggressiveOrder
    • CancelSurge
    • PriceImpactCluster

2. Limit Node Fan-In Using Latent Intermediates

  • Use intermediate nodes like:

    • OrderAggressiveness
    • AbuseLikelihood
    • MarketVolatilityLevel
  • Cap parent nodes to 3–4 to avoid exponential CPTs.

  • Use noisy-OR or canonical forms for simplifying conditional probability definitions.

3. Tiered Inference Strategy

  • Use Tier 1 rules/stats for pre-filtering high-risk events.
  • Run Tier 2 BN inference only for flagged candidates.
  • Precompute partial inference graphs where possible.

4. Temporal Granularity Optimization

  • Use 3–5 behavior steps for spoofing (e.g., Place > Cancel > Trade).
  • Use 1–3 day slices for insider trading.
  • Represent transitions between behavioral states, not absolute time intervals.

5. Robust Parameterization Under Sparse Labels

  • Use:

    • Expert priors for key nodes (IntentToManipulate, AccessToMNPI).
    • Synthetic labeled abuse cases for model testing.
    • Unsupervised signals (e.g., Z-scores) as BN input nodes.
  • Apply Dirichlet priors or pseudo-counts for CPT stability.


🧠 Summary Table

Design Element Recommendation
Node Design Use behavioral abstractions, not raw features.
CPT Construction Apply latent layers + canonical forms.
Time Slice Depth Max 3–5 behavior events (spoofing) or 1–3 days (insider).
Inference Strategy Tiered (pre-filter → BN) or batch mode.
Data Strategy Combine SME priors + synthetic examples + weak labels.

📚 Reference

Wu, Y., Wang, H., Zhang, J., & Yu, P. S. (2018). “Detecting Spoofing Trades Using Dynamic Bayesian Networks.” IEEE International Conference on Machine Learning and Applications (ICMLA).