Shifu 0.2.4 SPDT Binning Algorithm - ShifuML/shifu GitHub Wiki
What is SPDT?
SPDT is a new method in 'stats' step to improve scalability.
How to Enable?
Set binningMethod in ModelConfig.json to 'SPDT'.
How does SPDT do 'stats'?
In each mapper, all columns are split into thousands of histograms. Then get local bins in each mapper and local bins will be merged into global bins to get global bin boundaries. Using new bin boundaries to join with column data to compute statistics.
References
TODO paper url