11.8 BN Scalability Lessons – Wu et al. 2018 - ravkorsurv/kor-ai-core GitHub Wiki
12.3 BN Scalability Lessons – Wu et al. 2018
This page summarizes key insights from Wu et al. (2018) on Dynamic Bayesian Networks (DBNs) for spoofing detection, and outlines how Kor.ai can apply these principles when designing Bayesian models for market abuse risk.
⚠️ Scalability Concerns in Wu et al. (2018)
Concern | Description |
---|---|
High Dimensionality | Modeling raw trade/order data at high frequency leads to explosive node counts across time slices. |
Large CPTs | Nodes with many parents produce large Conditional Probability Tables (CPTs), which are computationally expensive. |
Real-Time Constraints | Exact inference in large DBNs is too slow for sub-second alerting. |
Overfitting with Sparse Data | Few labeled abuse cases make parameter learning unstable. |
Temporal Slice Depth | Longer temporal horizons multiply nodes; must limit time-slice history. |
✅ Kor.ai Design Principles Based on Wu et al.
1. Event Abstraction Instead of Raw Tick Data
-
Avoid modeling every millisecond/tick event.
-
Create behavioral abstraction nodes, e.g.:
AggressiveOrder
CancelSurge
PriceImpactCluster
2. Limit Node Fan-In Using Latent Intermediates
-
Use intermediate nodes like:
OrderAggressiveness
AbuseLikelihood
MarketVolatilityLevel
-
Cap parent nodes to 3–4 to avoid exponential CPTs.
-
Use
noisy-OR
or canonical forms for simplifying conditional probability definitions.
3. Tiered Inference Strategy
- Use Tier 1 rules/stats for pre-filtering high-risk events.
- Run Tier 2 BN inference only for flagged candidates.
- Precompute partial inference graphs where possible.
4. Temporal Granularity Optimization
- Use 3–5 behavior steps for spoofing (e.g.,
Place > Cancel > Trade
). - Use 1–3 day slices for insider trading.
- Represent transitions between behavioral states, not absolute time intervals.
5. Robust Parameterization Under Sparse Labels
-
Use:
- Expert priors for key nodes (
IntentToManipulate
,AccessToMNPI
). - Synthetic labeled abuse cases for model testing.
- Unsupervised signals (e.g., Z-scores) as BN input nodes.
- Expert priors for key nodes (
-
Apply Dirichlet priors or pseudo-counts for CPT stability.
🧠 Summary Table
Design Element | Recommendation |
---|---|
Node Design | Use behavioral abstractions, not raw features. |
CPT Construction | Apply latent layers + canonical forms. |
Time Slice Depth | Max 3–5 behavior events (spoofing) or 1–3 days (insider). |
Inference Strategy | Tiered (pre-filter → BN) or batch mode. |
Data Strategy | Combine SME priors + synthetic examples + weak labels. |
📚 Reference
Wu, Y., Wang, H., Zhang, J., & Yu, P. S. (2018). “Detecting Spoofing Trades Using Dynamic Bayesian Networks.” IEEE International Conference on Machine Learning and Applications (ICMLA).