Epic J: Add reliability diagrams - pwhofman/probly GitHub Wiki
This epic introduces reliability diagrams for calibration evaluation. These diagrams visualize the relationship between predicted confidence and empirical accuracy, providing an intuitive way to assess model calibration quality.
Motivation and Context
While quantitative calibration metrics such as ECE (Expected Calibration Error) and Brier Score provide useful summaries, they do not always convey how and where a model’s confidence deviates from the empirical accuracy.
Reliability diagrams complement these metrics by showing how well predicted probabilities align with true outcome frequencies across confidence bins.
This visualization helps diagnose underconfidence or overconfidence tendencies, identify problematic probability ranges, and evaluate the effectiveness of calibration techniques like temperature scaling.
Goals
- Implement a reliability diagram plotting function in the plot module.
- Support key features such as:
- Varying the number of bins.
- Overlaying perfect calibration reference lines.
- Optionally displaying ECE values or bin counts on the plot.
- The function should take as predicted probabilities and targets as arguments.
- Make use of the existing functionality to compute ECE metrics. If this is not adequate, extend it.
- Provide example usage.
- Provide tests.
Constraints
- Implementation should use standard plotting libraries (e.g., Matplotlib), with optional support for interactive backends.
- Maintain consistent styling with other plots in the package.
- Adhere to the project’s code quality, testing, and documentation standards.