自顶向下切分 - shuiwanghuohuo/scorecard_wiki GitHub Wiki

from bin_method import best_bin as bb
bb.Best_Bin(flag_name, factor_name, is_round, data=pd.DataFrame(),bad_name='bad', 
         good_name='good', piece=5, rate=0.05, min_bin_size=50, 
         not_in_list=["None", "NaN", "NA", "nan", None,"-999", 
                      "-999.0", -999, "-1111", "-1111.0", -1111],
         cut_method='cut_ks', combine_method='combine_iv')

通过down_top式的贪心算法计算分bin的结果,先向下切分,再根据目标聚合(取最大的一组切分点),切分方法有ks,gini,ig,聚合方法有iv,gini,ig。

Parameter Description
---------------------
flag_name: string
    标签列名
factor_name: string
    指标列名
is_round : boolean
    是否需要有业务含义的分bin
data: dataframe,(default=pd.DataFrame())
    样本集
bad_name: string, (default='bad')
    坏样本个数列名
good_name: string, (default='good')
    好样本个数列名
piece: int, (default=5)
    最大分组数
rate: float, (default=0.05)
    每组样本最小占比
min_bin_size: int, (default=50)
    每组样本最小数
not_in_list: list, (default=["None", "NaN", "NA", "nan",None, "-999", "-999.0", -999,"-1111","-1111.0",-1111])
    空值列表
cut_method: string, (default='cut_ks')
    切分方法,可选cut_gini,cut_ig
combine_method: string, (default='combine_iv')
    聚合方法,可选combine_gini,combine_ig

Return
------
result : 一个pandas dataframe ,包含了分bin结果

For Examples
>>> data
       y    x
   0   0    1
   1   1    1
   2   0    2
   3   1    3
   4   0    3
   5   0    3
   6   0    3
   7   0    3
   8   0    3
   ...
 1200  1   12
>>> flag_name = 'y'
>>> factor_name = 'x'
>>> bad_name = 'bad'
>>> good_name = 'good'
>>> piece = 5
>>> rate = 0.05
>>> min_bin_size = 50
>>> not_in_list = []
>>> is_round = False
>>> Best_KS_Bin(flag_name, factor_name, data bad_name, good_name, piece, rate, min_bin_size, not_in_list)
         Bin     KS      WOE        IV  total_count   bad_rate
0   (-inf,3]    0.2   0.1256    0.0012       279        0.2222
1     (3,6]    0.25  -0.1323     0.10        623        0.3333
2   (6,inf)    0.18   -0.298    0.025        299        0.6666