Variable Binning in Shifu - ShifuML/shifu GitHub Wiki
What is Variable Binning?
- EqualInterval
- EqualTotal
- EqualPositive
- EqualNegative
- DynamicBinning
Default Binning Algorithm
Sort in MR is leveraged, while still performance issue in big data.
Histogram Binning Algorithm
Dynamic Binning Algorithm
How Binning is Used in Shifu?
- KS Value
- Information Value (IV)
- Woe Transform
- Tree Model Training
Rebin Support in Shifu
If binning is not good in stats step, we can also do rebin to merge some small bins together to make distribution better:
shifu stats -rebin -n 30 -ivr 0.98 -bic 2000