10_pgmpy_Migration_Plan - ravkorsurv/kor-ai-core GitHub Wiki

Migration Plan: From Agena API to pgmpy – Kor.ai

This page outlines the strategy for moving from the Agena Cloud API (commercial Bayesian scoring engine) to an in-house solution built using the open-source pgmpy library.

🎯 Why Migrate?

Reason	Benefit
Licensing	Remove commercial dependency
IP Control	Full transparency and internal ownership
Customisation	Fine-grained control over inference logic
Offline Capability	Local scoring, no cloud calls
Integration Flexibility	Easier to embed into microservices

🧠 Target Stack

Component	Tool / Format
Model Definition	JSON → pgmpy format
CPT Handling	pgmpy.TabularCPD
Graph Structure	pgmpy.BayesianModel
Inference Engine	pgmpy.VariableElimination
Deployment	Python microservice (Flask/FastAPI)
CI/CD	GitHub Actions + pytest coverage

🧱 Migration Phases

🔹 Phase 1 – Extract Existing Logic

Export all .json models from Agena
Convert structure into pgmpy graph format
Preserve obfuscated node naming (Q1, Q2, etc.)
Document CPTs per node

🔹 Phase 2 – Rebuild Scoring Logic

Define all CPDs using TabularCPD
Create graph using BayesianModel.add_edges_from(...)
Build internal test suite for scoring scenarios

🔹 Phase 3 – Build Model Runner API

Wrap model inside a POST-based microservice
Inputs: Node dictionary (e.g. {Q1: "High", Q3: "No"})
Output: Final risk probabilities ({"Q10": {"Yes": 0.76}})

🔹 Phase 4 – Benchmark & Validate

Compare Agena vs. pgmpy scoring on test cases
Adjust CPTs or structure where needed
Run performance benchmarks on live data

🔹 Phase 5 – Cutover + Deprecate Agena

Replace Agena calls in core inference service
Update alert pipeline logic to use new microservice
Finalise internal documentation + alert audit links

📂 File Structure

Suggested layout under kor-ai-core/bayesian-engine/:

🧪 Validation Plan

Use all test cases from 06_Test_Case_Scenarios.md
Confirm output score delta between Agena and pgmpy ≤ ±0.03
Use mock input coverage for edge cases and missing values

🔁 Long-Term Vision

Enable retraining of CPTs using historical case data
Support hybrid graph + rule scoring
Extend engine to support multi-risk models in parallel

Maintainer: @ravkorsurv
Target Cutover: Q4 2025