在sas构建评分卡系列一已经对变量分箱在保证woe的单调作了补充sas构建评分卡模型过程详解(一):特征处理及变量分箱
我们希望的好的分箱是这样子的:
但是如果加单调限制的iv相比不加单调限制的iv差别不小,或者有的变量在有跳点的情况下也符合业务需求,这时处于中间的风险低于或者高于两侧的风险。
如:
但我们通常也不希望跳点多于1个,像下面这样子:
因此,可以实现挑选出最多只有一个跳点的分箱,
/*计算woe值的跳点*/
%macro cal_woe_jump;
%do ci=3 %to 6;
data mp;
set dt.New_lx_mapiv_0_&ci.;
where GRP_VAR^=0;
run;
proc sort data=mp;by varname GRP_VAR;run;
data mp1;
set mp;
by varname GRP_VAR;
retain woe_fg 0;
if first.varname then woe_fg=woe;
else do;
if woe_fg>woe then fg=-1;
else fg=1;
woe_fg=woe;
end;
run;
proc sql;
create table mp2 as
select varname,max(iv) as iv,
max(case when fg=1 then GRP_VAR else 0 end) as a_rank,
min(case when fg=1 then GRP_VAR else 999 end) as a_rank_min,
sum(case when fg=1 then fg else 0 end) as a_sum,
max(case when fg=-1 then GRP_VAR else 0 end) as d_rank,
min(case when fg=-1 then GRP_VAR else 999 end) as d_rank_min,
sum(case when fg=-1 then -fg else 0 end) as d_sum,
max(GRP_VAR) as GRP_VAR
from mp1
group by 1;
quit;
data mp3;
set mp2;
num=&ci.;
where iv>=0.02;
if (d_rank_min-a_rank=1 or a_rank_min-d_rank=1) or a_sum=0 or d_sum=0;
keep varname iv GRP_VAR;
rename iv=iv_&ci. GRP_VAR=GRP_VAR_&ci.;
run;
%if &ci=3 %then %do;
data out_woe;
set mp3;
run;
%end;
%else %do;
proc sql;
create table out_woe as
select a.*,b.*
from out_woe a
left join mp3 b on a.varname=b.varname;
quit;
%end;
%end;
%mend;
%cal_woe_jump;
并与不加单调限制的分箱对比,
proc sql;
create table out_woe1 as
select a.*,b.iv as iv0,b.GRP_VAR as GRP_VAR0
from out_woe a
left join iv3 b on a.varname=b.varname;
quit;
data out_woe2;
set out_woe1;
array ar{*} iv_:;
do i=1 to dim(ar);
if ar(i)>temp then do;temp=ar(i);s_name=vname(ar(i));end;
end;
keep varname iv: s_name;
run;
以此来挑选出好的分箱。