自顶向下切分 - shuiwanghuohuo/scorecard_wiki GitHub Wiki
from bin_method import best_bin as bb
bb.Best_Bin(flag_name, factor_name, is_round, data=pd.DataFrame(),bad_name='bad',
good_name='good', piece=5, rate=0.05, min_bin_size=50,
not_in_list=["None", "NaN", "NA", "nan", None,"-999",
"-999.0", -999, "-1111", "-1111.0", -1111],
cut_method='cut_ks', combine_method='combine_iv')
通过down_top式的贪心算法计算分bin的结果,先向下切分,再根据目标聚合(取最大的一组切分点),切分方法有ks,gini,ig,聚合方法有iv,gini,ig。
Parameter Description
---------------------
flag_name: string
标签列名
factor_name: string
指标列名
is_round : boolean
是否需要有业务含义的分bin
data: dataframe,(default=pd.DataFrame())
样本集
bad_name: string, (default='bad')
坏样本个数列名
good_name: string, (default='good')
好样本个数列名
piece: int, (default=5)
最大分组数
rate: float, (default=0.05)
每组样本最小占比
min_bin_size: int, (default=50)
每组样本最小数
not_in_list: list, (default=["None", "NaN", "NA", "nan",None, "-999", "-999.0", -999,"-1111","-1111.0",-1111])
空值列表
cut_method: string, (default='cut_ks')
切分方法,可选cut_gini,cut_ig
combine_method: string, (default='combine_iv')
聚合方法,可选combine_gini,combine_ig
Return
------
result : 一个pandas dataframe ,包含了分bin结果
For Examples
>>> data
y x
0 0 1
1 1 1
2 0 2
3 1 3
4 0 3
5 0 3
6 0 3
7 0 3
8 0 3
...
1200 1 12
>>> flag_name = 'y'
>>> factor_name = 'x'
>>> bad_name = 'bad'
>>> good_name = 'good'
>>> piece = 5
>>> rate = 0.05
>>> min_bin_size = 50
>>> not_in_list = []
>>> is_round = False
>>> Best_KS_Bin(flag_name, factor_name, data bad_name, good_name, piece, rate, min_bin_size, not_in_list)
Bin KS WOE IV total_count bad_rate
0 (-inf,3] 0.2 0.1256 0.0012 279 0.2222
1 (3,6] 0.25 -0.1323 0.10 623 0.3333
2 (6,inf) 0.18 -0.298 0.025 299 0.6666