Variable Binning in Shifu - ShifuML/shifu GitHub Wiki

What is Variable Binning?

Binning

  • EqualInterval
  • EqualTotal
  • EqualPositive
  • EqualNegative
  • DynamicBinning

Default Binning Algorithm

Sort in MR is leveraged, while still performance issue in big data.

Default Binning Algorithm

Histogram Binning Algorithm

Histogram Binning Algorithm

Dynamic Binning Algorithm

How Binning is Used in Shifu?

Rebin Support in Shifu

If binning is not good in stats step, we can also do rebin to merge some small bins together to make distribution better:

shifu stats -rebin -n 30 -ivr 0.98 -bic 2000