Python 评分卡学会这篇文章就够了

关注微信公共号:小程在线

关注CSDN博客:程志伟的博客

基于这样的思想,我们总结出我们对一个特征进行分箱的步骤:
1)我们首先把连续型变量分成一组数量较多的分类型变量,比如,将几万个样本分成100组,或50组
2)确保每一组中都要包含两种类别的样本,否则IV值会无法计算
3)我们对相邻的组进行卡方检验,卡方检验的P值很大的组进行合并,直到数据中的组数小于设定的N箱为止
4)我们让一个特征分别分成[2,3,4.....20]箱,观察每个分箱个数下的IV值如何变化,找出最适合的分箱个数
5)分箱完毕后,我们计算每个箱的WOE值, ,观察分箱效果
这些步骤都完成后,我们可以对各个特征都进行分箱,然后观察每个特征的IV值,以此来挑选特征
 

#按照等频对需要分箱的列进行分箱

model_data["qcut"], updown = pd.qcut(model_data["age"], retbins=True, q=20)

model_data["qcut"]
Out[46]: 
0             (52.0, 54.0]
1             (61.0, 64.0]
2             (39.0, 41.0]
3             (68.0, 74.0]
4             (52.0, 54.0]
      
195003      (31.018, 34.0]
195004    (48.498, 50.006]
195005      (45.0, 46.982]
195006        (61.0, 64.0]
195007        (52.0, 54.0]
Name: qcut, Length: 195008, dtype: category
Categories (20, interval[float64]): [(20.999, 28.0] < (28.0, 31.018] < (31.018, 34.0] < (34.0, 36.646] ... (61.0, 64.0] < (64.0, 68.0] < (68.0, 74.0] < (74.0, 107.0]]

 

#每个段的数量

model_data["qcut"].value_counts()
Out[47]: 
(36.646, 39.0]      11019
(61.0, 64.0]        10740
(31.018, 34.0]      10704
(50.006, 52.0]      10680
(58.689, 61.0]      10572
(20.999, 28.0]      10313
(43.0, 45.0]        10116
(52.0, 54.0]        10115
(41.0, 43.0]         9778
(46.982, 48.498]     9750
(48.498, 50.006]     9750
(39.0, 41.0]         9476
(74.0, 107.0]        9296
(64.0, 68.0]         9285
(28.0, 31.018]       9188
(54.0, 56.0]         9119
(56.0, 58.689]       9088
(68.0, 74.0]         8859
(34.0, 36.646]       8797
(45.0, 46.982]       8363
Name: qcut, dtype: int64

 

#所有箱子的上限和下限
updown
Out[48]: 
array([ 21.        ,  28.        ,  31.01848249,  34.        ,
        36.64611309,  39.        ,  41.        ,  43.        ,
        45.        ,  46.98189964,  48.4983757 ,  50.00572276,
        52.        ,  54.        ,  56.        ,  58.68881023,
        61.        ,  64.        ,  68.        ,  74.        ,
       107.        ])

 

# 统计每个分箱中0和1的数量
# 这里使用了数据透视表的功能groupby

coount_y0 = model_data[model_data["SeriousDlqin2yrs"] == 0].groupby(by="qcut").count()["SeriousDlqin2yrs"]
coount_y1 = model_data[model_data["SeriousDlqin2yrs"] == 1].groupby(by="qcut").count()["SeriousDlqin2yrs"]


#num_bins值分别为每个区间的上界,下界,0出现的次数,1出现的次数
num_bins = [*zip(updown,updown[1:],coount_y0,coount_y1)]


#注意zip会按照最短列来进行结合
num_bins
Out[50]: 
[(21.0, 28.0, 4243, 6070),
 (28.0, 31.018482492669673, 3571, 5617),
 (31.018482492669673, 34.0, 4075, 6629),
 (34.0, 36.646113086053596, 2908, 5889),
 (36.646113086053596, 39.0, 5182, 5837),
 (39.0, 41.0, 3956, 5520),
 (41.0, 43.0, 4002, 5776),
 (43.0, 45.0, 4389, 5727),
 (45.0, 46.9818996394487, 2419, 5944),
 (46.9818996394487, 48.498375695997765, 4813, 4937),
 (48.498375695997765, 50.00572276244543, 4900, 4850),
 (50.00572276244543, 52.0, 4728, 5952),
 (52.0, 54.0, 4681, 5434),
 (54.0, 56.0, 4677, 4442),
 (56.0, 58.6888102319735, 4483, 4605),
 (58.6888102319735, 61.0, 6583, 3989),
 (61.0, 64.0, 6968, 3772),
 (64.0, 68.0, 6623, 2662),
 (68.0, 74.0, 6753, 2106),
 (74.0, 107.0, 7737, 1559)]

 

3.3 定义WOE和IV函数
#计算WOE和BAD RATE
#BAD RATE与bad%不是一个东西
#BAD RATE是一个箱中,坏的样本所占的比例 (bad/total)
#而bad%是一个箱中的坏样本占整个特征中的坏样本的比例

 

columns = ["min","max","count_0","count_1"]
df = pd.DataFrame(num_bins,columns=columns)
df
Out[53]: 
          min         max  count_0  count_1
0   21.000000   28.000000     4243     6070
1   28.000000   31.018482     3571     5617
2   31.018482   34.000000     4075     6629
3   34.000000   36.646113     2908     5889
4   36.646113   39.000000     5182     5837
5   39.000000   41.000000     3956     5520
6   41.000000   43.000000     4002     5776
7   43.000000   45.000000     4389     5727
8   45.000000   46.981900     2419     5944
9   46.981900   48.498376     4813     4937
10  48.498376   50.005723     4900     4850
11  50.005723   52.000000     4728     5952
12  52.000000   54.000000     4681     5434
13  54.000000   56.000000     4677     4442
14  56.000000   58.688810     4483     4605
15  58.688810   61.000000     6583     3989
16  61.000000   64.000000     6968     3772
17  64.000000   68.000000     6623     2662
18  68.000000   74.000000     6753     2106
19  74.000000  107.000000     7737     1559

df["total"] = df.count_0 + df.count_1

#一个箱子中样本数占总数的比例
df["percentage"] = df.total / df.total.sum()

#BAD RATE是一个箱中,坏的样本所占的比例
df["bad_rate"] = df.count_1 / df.total


df["good%"] = df.count_0/df.count_0.sum()

#而bad%是一个箱中的坏样本占整个特征中的坏
df["bad%"] = df.count_1/df.count_1.sum()

df["woe"] = np.log(df["good%"] / df["bad%"])
df
Out[54]: 
          min         max  count_0  ...     good%      bad%       woe
0   21.000000   28.000000     4243  ...  0.043433  0.062373 -0.361924
1   28.000000   31.018482     3571  ...  0.036554  0.057719 -0.456788
2   31.018482   34.000000     4075  ...  0.041713  0.068118 -0.490419
3   34.000000   36.646113     2908  ...  0.029767  0.060514 -0.709456
4   36.646113   39.000000     5182  ...  0.053045  0.059979 -0.122862
5   39.000000   41.000000     3956  ...  0.040495  0.056722 -0.336980
6   41.000000   43.000000     4002  ...  0.040966  0.059352 -0.370753
7   43.000000   45.000000     4389  ...  0.044927  0.058849 -0.269926
8   45.000000   46.981900     2419  ...  0.024762  0.061079 -0.902864
9   46.981900   48.498376     4813  ...  0.049268  0.050731 -0.029273
10  48.498376   50.005723     4900  ...  0.050158  0.049837  0.006421
11  50.005723   52.000000     4728  ...  0.048397  0.061161 -0.234061
12  52.000000   54.000000     4681  ...  0.047916  0.055838 -0.152999
13  54.000000   56.000000     4677  ...  0.047875  0.045645  0.047716
14  56.000000   58.688810     4483  ...  0.045890  0.047320 -0.030686
15  58.688810   61.000000     6583  ...  0.067386  0.040990  0.497114
16  61.000000   64.000000     6968  ...  0.071327  0.038760  0.609887
17  64.000000   68.000000     6623  ...  0.067795  0.027354  0.907635
18  68.000000   74.000000     6753  ...  0.069126  0.021641  1.161361
19  74.000000  107.000000     7737  ...  0.079199  0.016020  1.598134

[20 rows x 10 columns]

 


def get_woe(num_bins):

# 通过 num_bins 数据计算 woe
    columns = ["min","max","count_0","count_1"]
    df = pd.DataFrame(num_bins,columns=columns)
    df["total"] = df.count_0 + df.count_1
    df["percentage"] = df.total / df.total.sum()
    df["bad_rate"] = df.count_1 / df.total
    df["good%"] = df.count_0/df.count_0.sum()
    df["bad%"] = df.count_1/df.count_1.sum()
    df["woe"] = np.log(df["good%"] / df["bad%"])
    return df

 

#计算IV值
def get_iv(df):
    rate = df["good%"] - df["bad%"]
    iv = np.sum(rate * df.woe)
    return iv

 

3.4 卡方检验,合并箱体,画出IV曲线

num_bins_ = num_bins.copy()
import matplotlib.pyplot as plt
import scipy

x1 = num_bins_[0][2:]
x1
Out[57]: (4243, 6070)

x2 = num_bins_[1][2:]
x2
Out[58]: (3571, 5617)

 

#卡方检验

scipy.stats.chi2_contingency([x1,x2])
Out[59]: 
(10.3901943590052,
 0.0012668628767570208,
 1,
 array([[4132.39228757, 6180.60771243],
        [3681.60771243, 5506.39228757]]))

scipy.stats.chi2_contingency([x1,x2])[1]
Out[60]: 0.0012668628767570208

IV = []
axisx = []
while len(num_bins_) > 2:
    pvs = []

# 获取 num_bins_两两之间的卡方检验的置信度(或卡方值)
    for i in range(len(num_bins_)-1):
        x1 = num_bins_[i][2:]
        x2 = num_bins_[i+1][2:]
# 0 返回 chi2 值,1 返回 p 值。
    pv = scipy.stats.chi2_contingency([x1,x2])[1]
# chi2 = scipy.stats.chi2_contingency([x1,x2])[0]
    pvs.append(pv)
# 通过 p 值进行处理。合并 p 值最大的两组
    i = pvs.index(max(pvs))
    num_bins_[i:i+2] = [(
            num_bins_[i][0],
            num_bins_[i+1][1],
            num_bins_[i][2]+num_bins_[i+1][2],
            num_bins_[i][3]+num_bins_[i+1][3])]
    bins_df = get_woe(num_bins_)
    axisx.append(len(num_bins_))
    IV.append(get_iv(bins_df))
plt.figure()
plt.plot(axisx,IV)
plt.xticks(axisx)
plt.xlabel("number of box")
plt.ylabel("IV")
plt.show()

 

3.5 用最佳分箱个数分箱,并验证分箱结果
def get_bin(num_bins_,n):
    while len(num_bins_) > n:
        pvs = []
        for i in range(len(num_bins_)-1):
            x1 = num_bins_[i][2:]
            x2 = num_bins_[i+1][2:]
            pv = scipy.stats.chi2_contingency([x1,x2])[1]
            # chi2 = scipy.stats.chi2_contingency([x1,x2])[0]
            pvs.append(pv)
    i = pvs.index(max(pvs))
    num_bins_[i:i+2] = [(
            num_bins_[i][0],
            num_bins_[i+1][1],
            num_bins_[i][2]+num_bins_[i+1][2],
            num_bins_[i][3]+num_bins_[i+1][3])]
    return num_bins_

afterbins = get_bin(num_bins,6)
afterbins


afterbins = get_bin(num_bins,6)
afterbins

 

3.6 将选取最佳分箱个数的过程包装为函数

def graphforbestbin(DF, X, Y, n=5,q=20,graph=True):
    """
    自动最优分箱函数,基于卡方检验的分箱
    参数:
    DF: 需要输入的数据
    X: 需要分箱的列名
    Y: 分箱数据对应的标签 Y 列名
    n: 保留分箱个数
    q: 初始分箱的个数
    graph: 是否要画出IV图像
    区间为前开后闭 (]
        """
    DF = DF[[X,Y]].copy()
    DF["qcut"],bins = pd.qcut(DF[X], retbins=True, q=q,duplicates="drop")
    coount_y0 = DF.loc[DF[Y]==0].groupby(by="qcut").count()[Y]
    coount_y1 = DF.loc[DF[Y]==1].groupby(by="qcut").count()[Y]
    num_bins = [*zip(bins,bins[1:],coount_y0,coount_y1)]
    
    for i in range(q):
        if 0 in num_bins[0][2:]:
            num_bins[0:2] = [(
                    num_bins[0][0],
                    num_bins[1][1],
                    num_bins[0][2]+num_bins[1][2],
                    num_bins[0][3]+num_bins[1][3])]
        continue
        
        for i in range(len(num_bins)):
            if 0 in num_bins[i][2:]:
                num_bins[i-1:i+1] = [(
                        num_bins[i-1][0],
                        num_bins[i][1],
                        num_bins[i-1][2]+num_bins[i][2],
                        num_bins[i-1][3]+num_bins[i][3])]
                break
        else:
            break
    
    def get_woe(num_bins):
        columns = ["min","max","count_0","count_1"]
        df = pd.DataFrame(num_bins,columns=columns)
        df["total"] = df.count_0 + df.count_1
        df["percentage"] = df.total / df.total.sum()
        df["bad_rate"] = df.count_1 / df.total
        df["good%"] = df.count_0/df.count_0.sum()
        df["bad%"] = df.count_1/df.count_1.sum()
        df["woe"] = np.log(df["good%"] / df["bad%"])
        return df
    
    def get_iv(df):
        rate = df["good%"] - df["bad%"]
        iv = np.sum(rate * df.woe)
        return iv
    
    IV = []
    axisx = []
    while len(num_bins) > n:
        pvs = []
        for i in range(len(num_bins)-1):
            x1 = num_bins[i][2:]
            x2 = num_bins[i+1][2:]
            pv = scipy.stats.chi2_contingency([x1,x2])[1]
            pvs.append(pv)
        
        i = pvs.index(max(pvs))
        num_bins[i:i+2] = [(
                num_bins[i][0],
                num_bins[i+1][1],
                num_bins[i][2]+num_bins[i+1][2],
                num_bins[i][3]+num_bins[i+1][3])]
        bins_df = pd.DataFrame(get_woe(num_bins))
        axisx.append(len(num_bins))
        IV.append(get_iv(bins_df))
    
    if graph:
        plt.figure()
        plt.plot(axisx,IV)
        plt.xticks(axisx)
        plt.show()
    return bins_df

 

3.7 对所有特征进行分箱选择
model_data.columns
Out[66]: 
Index(['SeriousDlqin2yrs', 'RevolvingUtilizationOfUnsecuredLines', 'age',
       'NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome',
       'NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate',
       'NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse',
       'NumberOfDependents', 'qcut'],
      dtype='object')

for i in model_data.columns[1:-1]:
    print(i)
    graphforbestbin(model_data,i,"SeriousDlqin2yrs",n=2,q=20)
RevolvingUtilizationOfUnsecuredLines


age

不是所有的特征都可以使用这个分箱函数

auto_col_bins = {"RevolvingUtilizationOfUnsecuredLines":5,
                 "age":5,
                 "DebtRatio":4,
                 "MonthlyIncome":3,
                 "NumberOfOpenCreditLinesAndLoans":5}

 

#不能使用自动分箱的变量
hand_bins = {"NumberOfTime30-59DaysPastDueNotWorse":[0,1,2,13]
    ,"NumberOfTimes90DaysLate":[0,1,2,17]
    ,"NumberRealEstateLoansOrLines":[0,1,2,4,54]
    ,"NumberOfTime60-89DaysPastDueNotWorse":[0,1,2,8]
    ,"NumberOfDependents":[0,1,2,3]}

 

#保证区间覆盖使用 np.inf替换最大值,用-np.inf替换最小值
hand_bins = {k:[-np.inf,*v[:-1],np.inf] for k,v in hand_bins.items()}
hand_bins
Out[74]: 
{'NumberOfTime30-59DaysPastDueNotWorse': [-inf, 0, 1, 2, inf],
 'NumberOfTimes90DaysLate': [-inf, 0, 1, 2, inf],
 'NumberRealEstateLoansOrLines': [-inf, 0, 1, 2, 4, inf],
 'NumberOfTime60-89DaysPastDueNotWorse': [-inf, 0, 1, 2, inf],
 'NumberOfDependents': [-inf, 0, 1, 2, inf]}

 

接下来对所有特征按照选择的箱体个数和手写的分箱范围进行分箱
bins_of_col = {}
# 生成自动分箱的分箱区间和分箱后的 IV 值
for col in auto_col_bins:
    bins_df = graphforbestbin(model_data,col
                              ,"SeriousDlqin2yrs"
                              ,n=auto_col_bins[col]
                              #使用字典的性质来取出每个特征所对应的箱的数量
                              ,q=12
                              ,graph=False)
    bins_list = sorted(set(bins_df["min"]).union(bins_df["max"]))
    #保证区间覆盖使用 np.inf 替换最大值 -np.inf 替换最小值
    bins_list[0],bins_list[-1] = -np.inf,np.inf
    bins_of_col[col] = bins_list

bins_of_col.update(hand_bins)

bins_of_col
Out[76]: 
{'RevolvingUtilizationOfUnsecuredLines': [-inf,
  0.09916399825000001,
  0.3241848415833333,
  0.6110537703073564,
  0.9999998999999999,
  inf],
 'age': [-inf,
  45.68896056223215,
  54.96325362710011,
  63.0,
  69.40827351386359,
  inf],
 'DebtRatio': [-inf, 0.055944251601378225, 0.33361565816666666, 32.0, inf],
 'MonthlyIncome': [-inf, 0.22, 5833.0, inf],
 'NumberOfOpenCreditLinesAndLoans': [-inf,
  3.5863179118140547,
  9.66596928081409,
  11.0,
  12.950941095975505,
  inf],
 'NumberOfTime30-59DaysPastDueNotWorse': [-inf, 0, 1, 2, inf],
 'NumberOfTimes90DaysLate': [-inf, 0, 1, 2, inf],
 'NumberRealEstateLoansOrLines': [-inf, 0, 1, 2, 4, inf],
 'NumberOfTime60-89DaysPastDueNotWorse': [-inf, 0, 1, 2, inf],
 'NumberOfDependents': [-inf, 0, 1, 2, inf]}

 

3.4 计算各箱的WOE并映射到数据中

def get_woe(df,col,y,bins):
    df = df[[col,y]].copy()
    df["cut"] = pd.cut(df[col],bins)
    bins_df = df.groupby("cut")[y].value_counts().unstack()
    woe = bins_df["woe"] =np.log((bins_df[0]/bins_df[0].sum())/(bins_df[1]/bins_df[1].sum()))
    return woe


#将所有特征的WOE存储到字典当中
woeall = {}
for col in bins_of_col:
    woeall[col] = get_woe(model_data,col,"SeriousDlqin2yrs",bins_of_col[col])

woeall
Out[77]: 
{'RevolvingUtilizationOfUnsecuredLines': cut
 (-inf, 0.0992]     2.206724
 (0.0992, 0.324]    0.602023
 (0.324, 0.611]    -0.389316
 (0.611, 1.0]      -1.027200
 (1.0, inf]        -2.044288
 dtype: float64, 'age': cut
 (-inf, 45.689]     -0.418293
 (45.689, 54.963]   -0.237499
 (54.963, 63.0]      0.394675
 (63.0, 69.408]      0.858461
 (69.408, inf]       1.444452
 dtype: float64, 'DebtRatio': cut
 (-inf, 0.0559]     0.901068
 (0.0559, 0.334]    0.055626
 (0.334, 32.0]     -0.292042
 (32.0, inf]        0.193145
 dtype: float64, 'MonthlyIncome': cut
 (-inf, 0.22]      0.844512
 (0.22, 5833.0]   -0.247486
 (5833.0, inf]     0.234283
 dtype: float64, 'NumberOfOpenCreditLinesAndLoans': cut
 (-inf, 3.586]    -0.359898
 (3.586, 9.666]    0.040580
 (9.666, 11.0]     0.372690
 (11.0, 12.951]   -0.561171
 (12.951, inf]     0.216359
 dtype: float64, 'NumberOfTime30-59DaysPastDueNotWorse': cut
 (-inf, 0.0]    1.095369
 (0.0, 1.0]    -1.380808
 (1.0, 2.0]    -2.050939
 (2.0, inf]    -2.306884
 dtype: float64, 'NumberOfTimes90DaysLate': cut
 (-inf, 0.0]    0.690287
 (0.0, 1.0]    -2.570842
 (1.0, 2.0]    -2.962132
 (2.0, inf]    -3.147651
 dtype: float64, 'NumberRealEstateLoansOrLines': cut
 (-inf, 0.0]    0.213342
 (0.0, 1.0]    -0.007980
 (1.0, 2.0]    -0.116866
 (2.0, 4.0]    -0.413832
 (4.0, inf]    -0.827201
 dtype: float64, 'NumberOfTime60-89DaysPastDueNotWorse': cut
 (-inf, 0.0]    0.539335
 (0.0, 1.0]    -2.457792
 (1.0, 2.0]    -2.932266
 (2.0, inf]    -2.880735
 dtype: float64, 'NumberOfDependents': cut
 (-inf, 0.0]    0.629351
 (0.0, 1.0]    -0.584413
 (1.0, 2.0]    -0.529292
 (2.0, inf]    -0.474088
 dtype: float64}

 

把所有WOE映射到原始数据中
#不希望覆盖掉原本的数据,创建一个新的DataFrame,索引和原始数据model_data一模一样
model_woe = pd.DataFrame(index=model_data.index)

 

#对所有特征操作可以写成
for col in bins_of_col:
    model_woe[col] = pd.cut(model_data[col],bins_of_col[col]).map(woeall[col])

model_woe.head()
Out[79]: 
  RevolvingUtilizationOfUnsecuredLines  ... NumberOfDependents
0                             2.206724  ...           0.629351
1                             0.602023  ...           0.629351
2                            -2.044288  ...          -0.474088
3                             2.206724  ...           0.629351
4                            -1.027200  ...          -0.584413

[5 rows x 10 columns]

 

#将标签补充到数据中
model_woe["SeriousDlqin2yrs"] = model_data["SeriousDlqin2yrs"]


#这就是我们的建模数据了
model_woe.head()
Out[80]: 
  RevolvingUtilizationOfUnsecuredLines  ... SeriousDlqin2yrs
0                             2.206724  ...                0
1                             0.602023  ...                0
2                            -2.044288  ...                1
3                             2.206724  ...                0
4                            -1.027200  ...                1

[5 rows x 11 columns]

 

3.5 建模与模型验证
vali_woe = pd.DataFrame(index=vali_data.index)

for col in bins_of_col:
    vali_woe[col] = pd.cut(vali_data[col],bins_of_col[col]).map(woeall[col])

vali_woe["SeriousDlqin2yrs"] = vali_data["SeriousDlqin2yrs"]
vali_X = vali_woe.iloc[:,:-1]
vali_y = vali_woe.iloc[:,-1]

X = model_woe.iloc[:,:-1]
y = model_woe.iloc[:,-1]

 

#建模

from sklearn.linear_model import LogisticRegression as LR
lr = LR().fit(X,y)
lr.score(vali_X,vali_y)
H:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
Out[83]: 0.8638963338757538

lr.n_iter_
Out[84]: array([6], dtype=int32)

 

#ROC曲线上的结果
import scikitplot as skplt

vali_proba_df = pd.DataFrame(lr.predict_proba(vali_X))
skplt.metrics.plot_roc(vali_y, vali_proba_df,
plot_micro=False,figsize=(6,6),
plot_macro=False)
Out[87]: <matplotlib.axes._subplots.AxesSubplot at 0x29581588908>


 

3.6 制作评分卡

假设对数几率为 时设定的特定分数为600,PDO=20,那么对数几率为 时的分数就是620。带入以上线
性表达式,可以得到
B = 20/np.log(2)
A = 600 + B*np.log(1/60)
B,A
Out[88]: (28.85390081777927, 481.8621880878296)

 

#计算基础分

base_score = A - B*lr.intercept_
base_score
Out[89]: array([481.37158938])

file = "H:/程志伟/python//ScoreData.csv"

with open(file,"w") as fdata:
    fdata.write("base_score,{}\n".format(base_score))

for i,col in enumerate(X.columns):
    score = woeall[col] * (-B*lr.coef_[0][i])
    score.name = "Score"
    score.index.name = col
    score.to_csv(file,header=True,mode="a")

就可以在excel中得到每个变量的分箱,至此评分卡全部完成。

 

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值