银联“信贷用户逾期预测”算法大赛总结

来源:老师 - kesci.com,微信公众号:Kaggle竞赛宝典
原文链接:银联“信贷用户逾期预测”算法大赛总结
点击以上链接👆 不用配置环境,直接在线运行

背景介绍

个人信用是整个社会信用的基础,市场交易中所有的经济活动,与个人信用息息相关。一旦个人行为失之约束,就会发生个人失信行为,进而出现集体失信。因此,个人信用体系建设具有极其重要的意义,然而随着经济的发展,越来越重要的信用记录与信用记录的缺失之间的矛盾日益激化,建立完善的信用体系迫在眉睫。随着近年来面向个人的小额贷款业务的不断发展,防范个人信贷欺诈,降低不良率是开展相关业务的首要目标。本届大赛旨在利用上述大数据和人工智能、机器学习相关技术,调动社会全员的大数据建模创新积极性,帮助金融机构准确评估个人信用情况,进一步提高信贷风险防范能力。

本届大赛的主题为“开放融合,共建信用”,赛题为“信贷用户逾期预测”,由参赛选手完成大数据算法模型的开发设计,实现对小额信贷业务申请个人欺诈和逾期风险的精准识别,进一步提升金融机构防范欺诈和降低不良率的能力。

工具包&数据的导入&数据EDA

工具包的导入

import pandas as pd
from sklearn.metrics import fbeta_score
from sklearn.model_selection import train_test_split 
from sklearn.cross_validation import KFold

数据的导入

model_sample = pd.read_csv('./Data/model_sample.csv')
model_sample.set_index('user_id',inplace=True)
label = model_sample[['y']]
model_sample = model_sample.drop('y',axis=1)

数据EDA

  1. 数据中存在较多的缺失值,需要进行特殊处理
  2. 数据的绝大部分维度特征相对合理,年龄什么的都符合要求,未出现离谱的特征;虽然像申请贷款笔数,成功申请贷款笔数(x_198,x_199)的存在一些相对比较夸张的数值,181,132等,但因为缺少先验信息,所以默认也算合理
  3. 数据中只存在int和float类型的变量,数据格式相对简单,所以无需进行额外处理
  4. 数据的个数只有11017个,属于低样本的情况,所以此处选择模型时需要注意过拟合
model_sample.head()

model_sample.describe()

model_sample.info()
<class 'pandas.core.frame.DataFrame'>
Index: 11017 entries, A00002 to A21941
Columns: 199 entries, x_001 to x_199
dtypes: float64(161), int64(38)
memory usage: 16.8+ MB

此处字段都有含义,细节可以参考"字段解释.xlsx"

image

  • 数据的个数只有11017个,属于低样本的情况
model_sample.shape
(11017, 199)

特征的构建

我们此处对特征进行构建,构建的思路是两个:

  1. 基于模型的表达能力,以及需要什么样的特征,将原先的特征重新进行编码(特征转换)
  2. 对特征进行组合构建,构建更多具有表示能力的特征

构建特征集合1

基于模型的表达能力的特征

  1. 将身份信息以及财产信息进行编码,0-1进行组合编码(提高模型的表示能力)

比例特征构建

  1. 借记卡的比例特征(各种借记卡所占的比例)
  2. 贷记卡的比例特征(各种贷记卡所占的比例)
  3. 银行卡的比例特征(各种银行卡所占的比例)
  4. 失败还款笔数占比
  5. 失败申请贷款的占比

标准差还原特征(反映信息的波动)

  1. 将数据中的标准差进行还原

均值特征

  1. 每张卡(例如信用)交易金额等;
  2. 每笔(例如异地每笔)交易金额等;
  3. 每笔还款金额等
  4. 每笔商旅,保险,家装,金融等的均值特征
  5. 每个月的平均交易笔数
  6. 每个月的交易金额
  7. 每笔放款金额,每个机构的放款笔数,每个机构的放款金额
  8. 每个机构的平均还款金额,每个机构的
  9. 每个机构的贷款金额
  10. 其他均值特征

趋势特征

  1. 90天与30天的申请贷款机构的趋势,180天与90天的申请贷款机构的趋势,180天与30天的申请贷款机构的趋势
  2. 90天与30天的成功申请贷款机构的趋势,180天与90天的成功申请贷款机构的趋势,180天与30天的成功申请贷款机构的趋势
  3. 90天与30天的申请贷款笔数的趋势,180天与90天的申请贷款笔数的趋势,180天与30天的申请贷款笔数的趋势
def get_features_middle(data):
    model_sample_strong_feature = data.copy()
    # 将身份信息以及财产信息进行编码
    first_strong_features = ['x_003','x_004','x_005','x_006','x_007','x_008','x_009','x_010','x_011','x_012','x_013','x_014','x_015','x_016','x_017','x_018','x_019']
    res = 0
    for i in range(len(first_strong_features)):
        res += 2 ** i * data[first_strong_features[i]]

    model_sample_strong_feature['x_1_strong'] = res 
    # 借记卡的比例特征
    model_sample_strong_feature['x_022/x_020'] = data['x_022'] / (data['x_020'] + 1e-10)
    ...
    model_sample_strong_feature['x_026/x_020'] = data['x_026'] /  (data['x_020'] + 1e-10)
    
    # 贷记卡的比例特征
    model_sample_strong_feature['x_028/x_021'] = data['x_028'] / (data['x_021'] + 1e-10)
    ...
    model_sample_strong_feature['x_032/x_021'] = data['x_032'] /  (data['x_021'] + 1e-10)
    
    # 银行卡的比例特征
    model_sample_strong_feature['all_cards'] = (data['x_034'] +  data['x_035'] + data['x_036'] + data['x_037'] + data['x_038'] + data['x_039'] + data['x_040']  ).values

    model_sample_strong_feature['x_034/all_cards'] = data['x_034'] / (model_sample_strong_feature['all_cards'] + 1e-10)
    ...
    model_sample_strong_feature['x_040/all_cards'] = data['x_040'] /  (model_sample_strong_feature['all_cards'] + 1e-10)
   
    # 标准差还原
    model_sample_strong_feature['x_043/x_044'] = data['x_043'] / (data['x_044'] + 1e-10)
    ...
    model_sample_strong_feature['x_126/x_127'] = data['x_126'] / (data['x_127'] + 1e-10)

    
    # 每张卡(信用or其他)交易金额等;每笔(异地每笔)交易金额等;每笔还款金额等;每笔商旅,保险,家装,金融等的均值特征;每个月的平均交易笔数;其他有意义的均值特征
    
    model_sample_strong_feature['x_045/x_41'] = data['x_045'] / (data['x_041'] + 1e-10)
    ...
    model_sample_strong_feature['x_130/x_128'] = data['x_130'] / (data['x_128'] + 1e-10)

    # 每笔放款金额,每个机构的放款笔数,每个机构的放款金额
    model_sample_strong_feature['x_133/x_134'] = data['x_133'] / (data['x_134'] + 1e-10)
    ...
    model_sample_strong_feature['x_144/x_142'] = data['x_144'] / (data['x_142'] + 1e-10)

    # 每个机构的放款均值,失败还款笔数占比
    model_sample_strong_feature['x_151/x_149'] = data['x_151'] / (data['x_149'] + 1e-10)
    ...
    model_sample_strong_feature['x_185/x_180'] = data['x_185'] / (data['x_180'] + 1e-10)
 
    # 90天与30天的申请贷款机构的趋势,180天与90天的申请贷款机构的趋势,180天与30天的申请贷款机构的趋势;90天与30天的成功申请贷款机构的趋势,180天与90天的成功申请贷款机构的趋势,180天;
    # 30天的成功申请贷款机构的趋势;90天与30天的申请贷款笔数的趋势,180天与90天的申请贷款笔数的趋势,180天与30天的申请贷款笔数的趋势90天的申请贷款笔数的趋势
    model_sample_strong_feature['x_189/x_188'] = data['x_189'] / (data['x_188'] + 1e-10)
    ...
    model_sample_strong_feature['x_192/x_188'] = data['x_192'] / (data['x_188'] + 1e-10)
                                                                   
    model_sample_strong_feature = model_sample_strong_feature.fillna(-999)
    return model_sample_strong_feature 

构建特征集合2

  • 该集合是上面集合的一个子集,用作融合使用(该集合在5折的时候)
def get_features_final(data):
    model_sample_strong_feature = data.copy()
    
    first_strong_features = ['x_003','x_004','x_005','x_006','x_007','x_008','x_009','x_010','x_011','x_012','x_013','x_014','x_015','x_016','x_017','x_018','x_019']

    res = 0
    for i in range(len(first_strong_features)):
        res += 2 ** i * data[first_strong_features[i]] 
    model_sample_strong_feature['x_1_strong'] = res

    model_sample_strong_feature['x_022/x_020'] = data['x_022'] / (data['x_020'] + 1e-10)
    ...
    model_sample_strong_feature['x_032/x_021'] = data['x_032'] /  (data['x_021'] + 1e-10)
    
    
    model_sample_strong_feature['all_cards'] = (data['x_034']  + data['x_035'] + data['x_036'] + data['x_037'] + data['x_038'] + data['x_039'] + data['x_040']).values

    model_sample_strong_feature['x_034/all_cards'] = data['x_034'] / (model_sample_strong_feature['all_cards'] + 1e-10)
    ...
    model_sample_strong_feature['x_040/all_cards'] = data['x_040'] /  (model_sample_strong_feature['all_cards'] + 1e-10)

    model_sample_strong_feature['x_027/x_033'] = data['x_027'] -  (data['x_033'] + 1e-10)
    ...
    model_sample_strong_feature['x_192/x_188'] = data['x_192'] / (data['x_188'] + 1e-10)

    
    model_sample_strong_feature = model_sample_strong_feature.fillna(-999)
    return model_sample_strong_feature 

模型训练

指标的构建

def get_score(y_pred, y_true):
    acc_ = accuracy_score(y_true=y_true,y_pred=y_pred)
    TP = np.sum(((y_pred == 1) & (y_true == 1))) 
    precision = TP / np.sum(y_pred)
    recall = TP / np.sum(y_true)
    print('TP: ',TP,'/', np.sum(y_true), 'all ',np.sum(y_pred), ' accuracy: ',acc_, ' precision: ',precision, ' recall: ',recall, ' F_score: ', 2 * precision * recall / (precision + recall),fbeta_score(y_true=y_true,y_pred=y_pred,beta=1) )
    

获取topN重要的特征

防止模型过拟合,同时以我们的经验判断,用于融合也可以带来较好的提升。

  • 因为训练时间的问题,我们最终的模型没有加入该类融合
def get_top_features(feature,model,topN):
    feature_importance = pd.DataFrame({'feature':feature,'importance':model.feature_importance()})
    feature_importance = feature_importance.sort_values('importance',ascending=False)
    feature_importance = feature_importance.loc[feature_importance['importance'] > 0]
    if feature_importance.shape[0] >= topN: 
        return feature_importance['feature'][:topN]
    else:
        return feature_importance['feature']

模型训练与验证

因为原始的数据只有11000多个,数据相对较少,为了防止模型过拟合,我们选择采用两种融合的方法

  1. 多个不同数据子集进行模型的训练
  2. 多个模型融合
def N_Fold_Predict( train_fea , train_y,  test_fea, cv_ = 5):

    ###########################################################
    train_fea = train_fea.fillna(-1)
    test_fea = test_fea.fillna(-1)

    features_col = [c for c in train_fea.columns if c not in ['user_id','y']]
    X = train_fea[features_col] 
    X_pred = test_fea[features_col]
    
    pred_out_lgb = 0
    pred_out_gbdt = 0
    pred_out_rf = 0
    
    for cv in range(cv_):
        X_train, X_test, y_train, y_test = train_test_split(X, train_y, test_size=0.25, random_state=np.random.randint(1000)) 
        # create dataset for lightgbm
        lgb_train = lgb.Dataset(X_train, y_train)
        lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

        # specify your configurations as a dict
        params = {
            'task':'train',
            'boosting_type':'gbdt',
            'num_leaves': 31,
            'objective': 'binary', 
            'learning_rate': 0.05, 
            'bagging_freq': 2, 
            'max_bin':256,
            'num_threads': 32
        } 

        # train
        gbm = lgb.train(params,
                    lgb_train,
                    verbose_eval= 0,
                    num_boost_round=10000,
                    valid_sets=lgb_eval,
                    early_stopping_rounds=100)

        lgb_pred = gbm.predict(X_pred, num_iteration=gbm.best_iteration)
        
        
        gbdt = GradientBoostingClassifier(n_estimators=250,learning_rate=0.01,max_depth=6,min_samples_leaf=5,min_samples_split=5)
        gbdt.fit(X_train, y_train)
        gbdt_pred = gbdt.predict_proba(X_pred)[:,1] 
        
        
        rf = RandomForestClassifier(n_estimators=500,max_depth=6,min_samples_leaf=5,min_samples_split=5)
        rf.fit(X_train, y_train)
        rf_pred = rf.predict_proba(X_pred)[:,1]  
        
        if cv == 0:
            pred_out_lgb = lgb_pred
            pred_out_gbdt = gbdt_pred
            pred_out_rf = rf_pred
        else:
            pred_out_lgb += lgb_pred
            pred_out_gbdt += gbdt_pred
            pred_out_rf += rf_pred
            
    pred_out_lgb = pred_out_lgb * 1.0 / cv_
    pred_out_gbdt = pred_out_gbdt * 1.0 / cv_
    pred_out_rf = pred_out_rf * 1.0 / cv_
    return pred_out_lgb, pred_out_gbdt, pred_out_rf

获取不同子集数据集

model_sample_strong_feature_middle = get_features_middle(model_sample)
model_sample_strong_feature_final = get_features_final(model_sample)
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)

多个模型训练与测试

  1. 解决F指标问题,我们选择采用调整阈值的方式,此处我们选择0.215作为我们最后的阈值(还可以调整权重等)
  2. 为了避免模型的过拟合,我们选择多个模型融合的方式(此处采用简单的加权融合的方式进行,因为stacking比较耗时,可能会超过30分钟)

for rnd in [1,10,100,1000]:
    print('Random Seed is: ',rnd)  
    train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd) 
    
    train_X_orig = model_sample_.loc[train_X.index]
    test_X_orig = model_sample_.loc[test_X.index]
    train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
    test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
    
    
    print('5 fold no feature engineering')
    pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
    pred =pred_out_rf >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.215
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
    print('5 fold feature engineering middle')
    pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
    pred = pred_out_rf_middle >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_middle >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle >= 0.215
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.215
    get_score(pred, test_y['y'].values)
     
#     pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.215
#     get_score(pred, test_y['y'].values)
     
    
    print('*' * 50)
    print('5 fold feature engineering final')
    pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 3)
    pred =pred_out_rf_final >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_final >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final >= 0.215
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.215
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
     
    print('Fire!')
    print('middle  and  original ')
    
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
    get_score(pred, test_y['y'].values) 
     
    print('final and  original ')
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
    get_score(pred, test_y['y'].values) 
     
    
    print('final and  middle and original ')
    pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55  + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    
    pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    
   
Random Seed is:  1000
5 fold no feature engineering
TP:  297 / 404 all  871  accuracy:  0.691016333938294  precision:  0.3409873708381171  recall:  0.7351485148514851  F_score:  0.46588235294117647 0.46588235294117647
TP:  285 / 404 all  765  accuracy:  0.7282214156079855  precision:  0.37254901960784315  recall:  0.7054455445544554  F_score:  0.4875962360992301 0.4875962360992301
TP:  287 / 404 all  766  accuracy:  0.7295825771324864  precision:  0.37467362924281983  recall:  0.7103960396039604  F_score:  0.4905982905982905 0.4905982905982905
TP:  293 / 404 all  773  accuracy:  0.7318511796733213  precision:  0.37904269081500647  recall:  0.7252475247524752  F_score:  0.49787595581988103 0.49787595581988103
TP:  293 / 404 all  774  accuracy:  0.7313974591651543  precision:  0.3785529715762274  recall:  0.7252475247524752  F_score:  0.49745331069609505 0.49745331069609505
TP:  294 / 404 all  780  accuracy:  0.7295825771324864  precision:  0.3769230769230769  recall:  0.7277227722772277  F_score:  0.49662162162162166 0.49662162162162166
**************************************************
5 fold feature engineering middle
TP:  317 / 404 all  927  accuracy:  0.6837568058076225  precision:  0.3419633225458468  recall:  0.7846534653465347  F_score:  0.47633358377160034 0.47633358377160034
TP:  300 / 404 all  824  accuracy:  0.7150635208711433  precision:  0.3640776699029126  recall:  0.7425742574257426  F_score:  0.48859934853420195 0.48859934853420195
TP:  298 / 404 all  824  accuracy:  0.7132486388384754  precision:  0.3616504854368932  recall:  0.7376237623762376  F_score:  0.485342019543974 0.485342019543974
TP:  304 / 404 all  839  accuracy:  0.7118874773139746  precision:  0.36233611442193087  recall:  0.7524752475247525  F_score:  0.4891391794046661 0.4891391794046661
TP:  304 / 404 all  842  accuracy:  0.7105263157894737  precision:  0.36104513064133015  recall:  0.7524752475247525  F_score:  0.48796147672552165 0.48796147672552165
TP:  306 / 404 all  842  accuracy:  0.7123411978221416  precision:  0.36342042755344417  recall:  0.7574257425742574  F_score:  0.4911717495987159 0.4911717495987159
TP:  305 / 404 all  841  accuracy:  0.7118874773139746  precision:  0.3626634958382878  recall:  0.754950495049505  F_score:  0.48995983935742976 0.48995983935742976
**************************************************
5 fold feature engineering final
TP:  316 / 404 all  913  accuracy:  0.6892014519056261  precision:  0.34611171960569553  recall:  0.7821782178217822  F_score:  0.4798785117691724 0.4798785117691724
TP:  299 / 404 all  830  accuracy:  0.7114337568058077  precision:  0.3602409638554217  recall:  0.7400990099009901  F_score:  0.48460291734197725 0.48460291734197725
TP:  291 / 404 all  795  accuracy:  0.72005444646098  precision:  0.3660377358490566  recall:  0.7202970297029703  F_score:  0.48540450375312766 0.48540450375312766
TP:  294 / 404 all  808  accuracy:  0.7168784029038112  precision:  0.36386138613861385  recall:  0.7277227722772277  F_score:  0.4851485148514852 0.4851485148514852
TP:  296 / 404 all  809  accuracy:  0.7182395644283122  precision:  0.3658838071693449  recall:  0.7326732673267327  F_score:  0.48804616652926625 0.48804616652926625
TP:  300 / 404 all  820  accuracy:  0.7168784029038112  precision:  0.36585365853658536  recall:  0.7425742574257426  F_score:  0.49019607843137253 0.49019607843137253
**************************************************
Fire!
middle  and  original 
TP:  299 / 404 all  806  accuracy:  0.7223230490018149  precision:  0.3709677419354839  recall:  0.7400990099009901  F_score:  0.4942148760330578 0.4942148760330578
TP:  299 / 404 all  809  accuracy:  0.7209618874773139  precision:  0.3695920889987639  recall:  0.7400990099009901  F_score:  0.49299258037922505 0.49299258037922505
final and  original 
TP:  298 / 404 all  810  accuracy:  0.7196007259528131  precision:  0.36790123456790125  recall:  0.7376237623762376  F_score:  0.49093904448105447 0.49093904448105447
TP:  298 / 404 all  810  accuracy:  0.7196007259528131  precision:  0.36790123456790125  recall:  0.7376237623762376  F_score:  0.49093904448105447 0.49093904448105447
final and  middle and original 
TP:  301 / 404 all  819  accuracy:  0.7182395644283122  precision:  0.36752136752136755  recall:  0.745049504950495  F_score:  0.4922322158626328 0.4922322158626328
TP:  301 / 404 all  822  accuracy:  0.7168784029038112  precision:  0.3661800486618005  recall:  0.745049504950495  F_score:  0.49102773246329523 0.49102773246329523
TP:  302 / 404 all  828  accuracy:  0.7150635208711433  precision:  0.3647342995169082  recall:  0.7475247524752475  F_score:  0.4902597402597403 0.4902597402597403
TP:  303 / 404 all  838  accuracy:  0.7114337568058077  precision:  0.3615751789976134  recall:  0.75  F_score:  0.48792270531400966 0.48792270531400966
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)
for rnd in [1,10,100,1000]:
    print('Random Seed is: ',rnd)  
    train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd) 
    
    train_X_orig = model_sample_.loc[train_X.index]
    test_X_orig = model_sample_.loc[test_X.index]
    train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
    test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
    
    
    print('5 fold no feature engineering')
    pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
    pred =pred_out_rf >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.23
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.23
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
    print('5 fold feature engineering middle')
    pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
    pred = pred_out_rf_middle >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_middle >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle >= 0.23
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.23
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.23
    get_score(pred, test_y['y'].values)
     
#     pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.23
#     get_score(pred, test_y['y'].values)
     
    
    print('*' * 50)
    print('5 fold feature engineering final')
    pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 3)
    pred =pred_out_rf_final >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_final >= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final >= 0.23
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.23
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.23
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.23
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
     
    print('Fire!')
    print('middle  and  original ')
    
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.23
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.23
    get_score(pred, test_y['y'].values) 
     
    print('final and  original ')
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.23
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.23
    get_score(pred, test_y['y'].values) 
     
    
    print('final and  middle and original ')
    pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55  + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values) 
    
    pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values) 
    pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values) 
    
   
Random Seed is:  1
5 fold no feature engineering
TP:  315 / 442 all  833  accuracy:  0.707350272232305  precision:  0.37815126050420167  recall:  0.7126696832579186  F_score:  0.4941176470588235 0.4941176470588235
TP:  286 / 442 all  721  accuracy:  0.7318511796733213  precision:  0.39667128987517336  recall:  0.6470588235294118  F_score:  0.4918314703353397 0.4918314703353397
TP:  290 / 442 all  728  accuracy:  0.7323049001814882  precision:  0.3983516483516483  recall:  0.6561085972850679  F_score:  0.4957264957264957 0.4957264957264957
TP:  290 / 442 all  722  accuracy:  0.73502722323049  precision:  0.40166204986149584  recall:  0.6561085972850679  F_score:  0.49828178694158065 0.49828178694158065
TP:  291 / 442 all  725  accuracy:  0.734573502722323  precision:  0.4013793103448276  recall:  0.6583710407239819  F_score:  0.4987146529562982 0.4987146529562982
TP:  292 / 442 all  731  accuracy:  0.7327586206896551  precision:  0.399452804377565  recall:  0.6606334841628959  F_score:  0.4978687127024723 0.4978687127024723
**************************************************
5 fold feature engineering middle
TP:  316 / 442 all  811  accuracy:  0.7182395644283122  precision:  0.38964241676942046  recall:  0.7149321266968326  F_score:  0.5043894652833201 0.5043894652833201
TP:  282 / 442 all  708  accuracy:  0.7341197822141561  precision:  0.3983050847457627  recall:  0.6380090497737556  F_score:  0.4904347826086956 0.4904347826086956
TP:  279 / 442 all  688  accuracy:  0.7404718693284936  precision:  0.4055232558139535  recall:  0.6312217194570136  F_score:  0.49380530973451336 0.49380530973451336
TP:  286 / 442 all  703  accuracy:  0.7400181488203267  precision:  0.406827880512091  recall:  0.6470588235294118  F_score:  0.4995633187772926 0.4995633187772926
TP:  281 / 442 all  701  accuracy:  0.7363883847549909  precision:  0.4008559201141227  recall:  0.6357466063348416  F_score:  0.4916885389326334 0.4916885389326334
TP:  285 / 442 all  710  accuracy:  0.735934664246824  precision:  0.4014084507042254  recall:  0.6447963800904978  F_score:  0.4947916666666667 0.4947916666666667
**************************************************
5 fold feature engineering final
TP:  322 / 442 all  846  accuracy:  0.7078039927404719  precision:  0.3806146572104019  recall:  0.7285067873303167  F_score:  0.5 0.5
TP:  287 / 442 all  726  accuracy:  0.7304900181488203  precision:  0.3953168044077135  recall:  0.6493212669683258  F_score:  0.4914383561643836 0.4914383561643836
TP:  287 / 442 all  711  accuracy:  0.7372958257713249  precision:  0.40365682137834036  recall:  0.6493212669683258  F_score:  0.4978317432784042 0.4978317432784042
TP:  290 / 442 all  725  accuracy:  0.7336660617059891  precision:  0.4  recall:  0.6561085972850679  F_score:  0.49700085689802914 0.49700085689802914
TP:  291 / 442 all  726  accuracy:  0.7341197822141561  precision:  0.40082644628099173  recall:  0.6583710407239819  F_score:  0.49828767123287665 0.49828767123287665
TP:  294 / 442 all  738  accuracy:  0.7313974591651543  precision:  0.3983739837398374  recall:  0.665158371040724  F_score:  0.49830508474576274 0.49830508474576274
**************************************************
Fire!
middle  and  original 
TP:  288 / 442 all  711  accuracy:  0.7382032667876588  precision:  0.4050632911392405  recall:  0.6515837104072398  F_score:  0.4995663486556808 0.4995663486556808
TP:  290 / 442 all  713  accuracy:  0.7391107078039928  precision:  0.4067321178120617  recall:  0.6561085972850679  F_score:  0.5021645021645021 0.5021645021645021
final and  original 
TP:  291 / 442 all  723  accuracy:  0.735480943738657  precision:  0.4024896265560166  recall:  0.6583710407239819  F_score:  0.4995708154506438 0.4995708154506438
TP:  291 / 442 all  723  accuracy:  0.735480943738657  precision:  0.4024896265560166  recall:  0.6583710407239819  F_score:  0.4995708154506438 0.4995708154506438
final and  middle and original 
TP:  290 / 442 all  727  accuracy:  0.7327586206896551  precision:  0.3988995873452545  recall:  0.6561085972850679  F_score:  0.49615055603079555 0.49615055603079555
TP:  291 / 442 all  727  accuracy:  0.7336660617059891  precision:  0.40027510316368636  recall:  0.6583710407239819  F_score:  0.49786142001710865 0.49786142001710865
TP:  293 / 442 all  734  accuracy:  0.7323049001814882  precision:  0.3991825613079019  recall:  0.6628959276018099  F_score:  0.49829931972789115 0.49829931972789115
TP:  294 / 442 all  737  accuracy:  0.7318511796733213  precision:  0.3989145183175034  recall:  0.665158371040724  F_score:  0.49872773536895676 0.49872773536895676
Random Seed is:  10
5 fold no feature engineering
TP:  323 / 438 all  859  accuracy:  0.7046279491833031  precision:  0.3760186263096624  recall:  0.7374429223744292  F_score:  0.49807247494217427 0.49807247494217427
TP:  296 / 438 all  735  accuracy:  0.7363883847549909  precision:  0.40272108843537413  recall:  0.6757990867579908  F_score:  0.5046888320545609 0.5046888320545609
TP:  308 / 438 all  782  accuracy:  0.7259528130671506  precision:  0.3938618925831202  recall:  0.7031963470319634  F_score:  0.5049180327868853 0.5049180327868853
TP:  304 / 438 all  763  accuracy:  0.7309437386569873  precision:  0.3984272608125819  recall:  0.6940639269406392  F_score:  0.5062447960033305 0.5062447960033305
TP:  303 / 438 all  759  accuracy:  0.7318511796733213  precision:  0.39920948616600793  recall:  0.6917808219178082  F_score:  0.506265664160401 0.506265664160401
TP:  305 / 438 all  769  accuracy:  0.7291288566243194  precision:  0.3966189856957087  recall:  0.6963470319634704  F_score:  0.5053852526926264 0.5053852526926264
**************************************************
5 fold feature engineering middle
TP:  344 / 438 all  916  accuracy:  0.6978221415607986  precision:  0.37554585152838427  recall:  0.7853881278538812  F_score:  0.5081240768094535 0.5081240768094535
TP:  310 / 438 all  783  accuracy:  0.7273139745916516  precision:  0.3959131545338442  recall:  0.7077625570776256  F_score:  0.5077805077805078 0.5077805077805078
TP:  325 / 438 all  837  accuracy:  0.7164246823956443  precision:  0.38829151732377537  recall:  0.7420091324200914  F_score:  0.5098039215686274 0.5098039215686274
TP:  320 / 438 all  815  accuracy:  0.7218693284936479  precision:  0.39263803680981596  recall:  0.730593607305936  F_score:  0.5107741420590584 0.5107741420590584
TP:  321 / 438 all  814  accuracy:  0.7232304900181489  precision:  0.39434889434889436  recall:  0.7328767123287672  F_score:  0.512779552715655 0.512779552715655
TP:  321 / 438 all  819  accuracy:  0.7209618874773139  precision:  0.39194139194139194  recall:  0.7328767123287672  F_score:  0.5107398568019094 0.5107398568019094
**************************************************
5 fold feature engineering final
TP:  337 / 438 all  880  accuracy:  0.7078039927404719  precision:  0.38295454545454544  recall:  0.769406392694064  F_score:  0.511380880121396 0.511380880121396
TP:  309 / 438 all  772  accuracy:  0.7313974591651543  precision:  0.40025906735751293  recall:  0.7054794520547946  F_score:  0.5107438016528926 0.5107438016528926
TP:  328 / 438 all  811  accuracy:  0.7309437386569873  precision:  0.40443896424167697  recall:  0.7488584474885844  F_score:  0.5252201761409128 0.5252201761409128
TP:  326 / 438 all  801  accuracy:  0.7336660617059891  precision:  0.4069912609238452  recall:  0.7442922374429224  F_score:  0.526230831315577 0.526230831315577
TP:  324 / 438 all  797  accuracy:  0.7336660617059891  precision:  0.4065244667503137  recall:  0.7397260273972602  F_score:  0.5246963562753036 0.5246963562753036
TP:  327 / 438 all  806  accuracy:  0.7323049001814882  precision:  0.4057071960297767  recall:  0.7465753424657534  F_score:  0.5257234726688104 0.5257234726688104
**************************************************
Fire!
middle  and  original 
TP:  313 / 438 all  778  accuracy:  0.7323049001814882  precision:  0.4023136246786632  recall:  0.7146118721461188  F_score:  0.5148026315789473 0.5148026315789473
TP:  311 / 438 all  775  accuracy:  0.7318511796733213  precision:  0.4012903225806452  recall:  0.7100456621004566  F_score:  0.5127782357790601 0.5127782357790601
final and  original 
TP:  317 / 438 all  780  accuracy:  0.73502722323049  precision:  0.4064102564102564  recall:  0.723744292237443  F_score:  0.5205254515599343 0.5205254515599343
TP:  315 / 438 all  778  accuracy:  0.7341197822141561  precision:  0.40488431876606684  recall:  0.7191780821917808  F_score:  0.5180921052631579 0.5180921052631579
final and  middle and original 
TP:  320 / 438 all  791  accuracy:  0.7327586206896551  precision:  0.404551201011378  recall:  0.730593607305936  F_score:  0.5207485760781123 0.5207485760781123
TP:  319 / 438 all  789  accuracy:  0.7327586206896551  precision:  0.40430925221799746  recall:  0.728310502283105  F_score:  0.5199674001629991 0.5199674001629991
TP:  326 / 438 all  799  accuracy:  0.734573502722323  precision:  0.40801001251564456  recall:  0.7442922374429224  F_score:  0.5270816491511722 0.5270816491511722
TP:  326 / 438 all  807  accuracy:  0.7309437386569873  precision:  0.40396530359355637  recall:  0.7442922374429224  F_score:  0.5236947791164658 0.5236947791164658
Random Seed is:  100
5 fold no feature engineering
TP:  312 / 430 all  851  accuracy:  0.7019056261343013  precision:  0.36662749706227965  recall:  0.7255813953488373  F_score:  0.48711943793911006 0.48711943793911006
TP:  292 / 430 all  742  accuracy:  0.7332123411978222  precision:  0.3935309973045822  recall:  0.6790697674418604  F_score:  0.4982935153583618 0.4982935153583618
TP:  300 / 430 all  774  accuracy:  0.7259528130671506  precision:  0.3875968992248062  recall:  0.6976744186046512  F_score:  0.4983388704318937 0.4983388704318937
TP:  297 / 430 all  755  accuracy:  0.7318511796733213  precision:  0.3933774834437086  recall:  0.6906976744186046  F_score:  0.5012658227848101 0.5012658227848101
TP:  298 / 430 all  755  accuracy:  0.7327586206896551  precision:  0.39470198675496687  recall:  0.6930232558139535  F_score:  0.5029535864978903 0.5029535864978903
TP:  301 / 430 all  766  accuracy:  0.7304900181488203  precision:  0.39295039164490864  recall:  0.7  F_score:  0.5033444816053512 0.5033444816053512
**************************************************
5 fold feature engineering middle
TP:  331 / 430 all  893  accuracy:  0.7000907441016334  precision:  0.3706606942889138  recall:  0.7697674418604651  F_score:  0.5003779289493575 0.5003779289493575
TP:  306 / 430 all  786  accuracy:  0.7259528130671506  precision:  0.3893129770992366  recall:  0.7116279069767442  F_score:  0.5032894736842105 0.5032894736842105
TP:  302 / 430 all  777  accuracy:  0.7264065335753176  precision:  0.3886743886743887  recall:  0.7023255813953488  F_score:  0.5004142502071252 0.5004142502071252
TP:  305 / 430 all  781  accuracy:  0.7273139745916516  precision:  0.3905249679897567  recall:  0.7093023255813954  F_score:  0.5037159372419487 0.5037159372419487
TP:  304 / 430 all  781  accuracy:  0.7264065335753176  precision:  0.3892445582586428  recall:  0.7069767441860465  F_score:  0.5020644095788604 0.5020644095788604
TP:  304 / 430 all  785  accuracy:  0.7245916515426497  precision:  0.3872611464968153  recall:  0.7069767441860465  F_score:  0.5004115226337449 0.5004115226337449
**************************************************
5 fold feature engineering final
TP:  326 / 430 all  876  accuracy:  0.7032667876588021  precision:  0.3721461187214612  recall:  0.7581395348837209  F_score:  0.49923430321592643 0.49923430321592643
TP:  296 / 430 all  758  accuracy:  0.7295825771324864  precision:  0.39050131926121373  recall:  0.6883720930232559  F_score:  0.4983164983164983 0.4983164983164983
TP:  295 / 430 all  754  accuracy:  0.7304900181488203  precision:  0.3912466843501326  recall:  0.686046511627907  F_score:  0.4983108108108108 0.4983108108108108
TP:  298 / 430 all  764  accuracy:  0.7286751361161524  precision:  0.3900523560209424  recall:  0.6930232558139535  F_score:  0.4991624790619766 0.4991624790619766
TP:  299 / 430 all  766  accuracy:  0.7286751361161524  precision:  0.39033942558746737  recall:  0.6953488372093023  F_score:  0.5 0.5
TP:  304 / 430 all  782  accuracy:  0.7259528130671506  precision:  0.3887468030690537  recall:  0.7069767441860465  F_score:  0.5016501650165016 0.5016501650165016
**************************************************
Fire!
middle  and  original 
TP:  298 / 430 all  772  accuracy:  0.7250453720508166  precision:  0.3860103626943005  recall:  0.6930232558139535  F_score:  0.49584026622296173 0.49584026622296173
TP:  297 / 430 all  768  accuracy:  0.7259528130671506  precision:  0.38671875  recall:  0.6906976744186046  F_score:  0.4958263772954925 0.4958263772954925
final and  original 
TP:  301 / 430 all  768  accuracy:  0.7295825771324864  precision:  0.3919270833333333  recall:  0.7  F_score:  0.5025041736227045 0.5025041736227045
TP:  302 / 430 all  766  accuracy:  0.7313974591651543  precision:  0.39425587467362927  recall:  0.7023255813953488  F_score:  0.5050167224080268 0.5050167224080268
final and  middle and original 
TP:  299 / 430 all  771  accuracy:  0.7264065335753176  precision:  0.38780804150453957  recall:  0.6953488372093023  F_score:  0.49791840133222315 0.49791840133222315
TP:  299 / 430 all  769  accuracy:  0.7273139745916516  precision:  0.38881664499349805  recall:  0.6953488372093023  F_score:  0.4987489574645537 0.4987489574645537
TP:  302 / 430 all  778  accuracy:  0.7259528130671506  precision:  0.38817480719794345  recall:  0.7023255813953488  F_score:  0.5 0.5
TP:  304 / 430 all  787  accuracy:  0.7236842105263158  precision:  0.386277001270648  recall:  0.7069767441860465  F_score:  0.49958915365653245 0.49958915365653245
Random Seed is:  1000
5 fold no feature engineering
TP:  292 / 404 all  833  accuracy:  0.7037205081669692  precision:  0.3505402160864346  recall:  0.7227722772277227  F_score:  0.4721099434114794 0.4721099434114794
TP:  281 / 404 all  745  accuracy:  0.7336660617059891  precision:  0.37718120805369126  recall:  0.6955445544554455  F_score:  0.4891209747606614 0.4891209747606614
TP:  278 / 404 all  749  accuracy:  0.7291288566243194  precision:  0.3711615487316422  recall:  0.6881188118811881  F_score:  0.4822202948829142 0.4822202948829142
TP:  280 / 404 all  749  accuracy:  0.7309437386569873  precision:  0.37383177570093457  recall:  0.693069306930693  F_score:  0.48568950563746743 0.48568950563746743
TP:  282 / 404 all  753  accuracy:  0.7309437386569873  precision:  0.3745019920318725  recall:  0.698019801980198  F_score:  0.48746758859118416 0.48746758859118416
TP:  283 / 404 all  755  accuracy:  0.7309437386569873  precision:  0.3748344370860927  recall:  0.7004950495049505  F_score:  0.4883520276100086 0.4883520276100086
**************************************************
5 fold feature engineering middle
TP:  306 / 404 all  870  accuracy:  0.6996370235934665  precision:  0.35172413793103446  recall:  0.7574257425742574  F_score:  0.4803767660910518 0.4803767660910518
TP:  275 / 404 all  770  accuracy:  0.7168784029038112  precision:  0.35714285714285715  recall:  0.6806930693069307  F_score:  0.46848381601362865 0.46848381601362865
TP:  276 / 404 all  759  accuracy:  0.7227767695099818  precision:  0.36363636363636365  recall:  0.6831683168316832  F_score:  0.47463456577815993 0.47463456577815993
TP:  287 / 404 all  785  accuracy:  0.7209618874773139  precision:  0.36560509554140125  recall:  0.7103960396039604  F_score:  0.48275862068965514 0.48275862068965514
TP:  285 / 404 all  782  accuracy:  0.720508166969147  precision:  0.36445012787723785  recall:  0.7054455445544554  F_score:  0.4806070826306914 0.4806070826306914
TP:  289 / 404 all  790  accuracy:  0.720508166969147  precision:  0.3658227848101266  recall:  0.7153465346534653  F_score:  0.4840871021775544 0.4840871021775544
**************************************************
5 fold feature engineering final
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)
for rnd in [1,10,100,1000]:
    print('Random Seed is: ',rnd)  
    train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd) 
    
    train_X_orig = model_sample_.loc[train_X.index]
    test_X_orig = model_sample_.loc[test_X.index]
    train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
    test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
    
    
    print('5 fold no feature engineering')
    pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
    pred =pred_out_rf >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.215
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
    print('5 fold feature engineering middle')
    pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
    pred = pred_out_rf_middle >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_middle >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle >= 0.215
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.215
    get_score(pred, test_y['y'].values)
     
#     pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.215
#     get_score(pred, test_y['y'].values)
     
    
    print('*' * 50)
    print('5 fold feature engineering final')
    pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 5)
    pred =pred_out_rf_final >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_gbdt_final >= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final >= 0.215
    get_score(pred, test_y['y'].values)
    
    pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.215
    get_score(pred, test_y['y'].values)
    pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.215
    get_score(pred, test_y['y'].values)
    pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.215
    get_score(pred, test_y['y'].values)
    
    print('*' * 50)
     
    print('Fire!')
    print('middle  and  original ')
    
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
    get_score(pred, test_y['y'].values) 
     
    print('final and  original ')
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55  + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5  + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
    get_score(pred, test_y['y'].values) 
     
    
    print('final and  middle and original ')
    pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55  + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values)
    pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    
    pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
    get_score(pred, test_y['y'].values) 
    
    pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values) 
    pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 +  pred_out_lgb_middle* 1.0 /3)*0.5  + 0.5 * (pred_out_gbdt_final * 1.0 /3 +  1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
    get_score(pred, test_y['y'].values) 
   
Random Seed is:  1
5 fold no feature engineering
TP:  340 / 442 all  923  accuracy:  0.6892014519056261  precision:  0.36836403033586135  recall:  0.7692307692307693  F_score:  0.4981684981684982 0.4981684981684982
TP:  312 / 442 all  801  accuracy:  0.719147005444646  precision:  0.3895131086142322  recall:  0.7058823529411765  F_score:  0.50201126307321 0.50201126307321
TP:  310 / 442 all  796  accuracy:  0.7196007259528131  precision:  0.38944723618090454  recall:  0.7013574660633484  F_score:  0.5008077544426495 0.5008077544426495
TP:  315 / 442 all  806  accuracy:  0.7196007259528131  precision:  0.39081885856079407  recall:  0.7126696832579186  F_score:  0.5048076923076924 0.5048076923076924
TP:  314 / 442 all  804  accuracy:  0.7196007259528131  precision:  0.39054726368159204  recall:  0.7104072398190046  F_score:  0.5040128410914928 0.5040128410914928
TP:  314 / 442 all  811  accuracy:  0.7164246823956443  precision:  0.3871763255240444  recall:  0.7104072398190046  F_score:  0.5011971268954509 0.5011971268954509
**************************************************
5 fold feature engineering middle
TP:  343 / 442 all  914  accuracy:  0.6960072595281307  precision:  0.37527352297593  recall:  0.7760180995475113  F_score:  0.5058997050147492 0.5058997050147492
TP:  318 / 442 all  820  accuracy:  0.7159709618874773  precision:  0.3878048780487805  recall:  0.7194570135746606  F_score:  0.5039619651347068 0.5039619651347068
TP:  307 / 442 all  779  accuracy:  0.7245916515426497  precision:  0.3940949935815148  recall:  0.6945701357466063  F_score:  0.5028665028665029 0.5028665028665029
TP:  311 / 442 all  800  accuracy:  0.7186932849364791  precision:  0.38875  recall:  0.7036199095022625  F_score:  0.500805152979066 0.500805152979066
TP:  312 / 442 all  803  accuracy:  0.7182395644283122  precision:  0.38854296388542964  recall:  0.7058823529411765  F_score:  0.5012048192771086 0.5012048192771086
TP:  316 / 442 all  814  accuracy:  0.7168784029038112  precision:  0.3882063882063882  recall:  0.7149321266968326  F_score:  0.5031847133757962 0.5031847133757962
**************************************************
5 fold feature engineering final
TP:  341 / 442 all  905  accuracy:  0.6982758620689655  precision:  0.37679558011049724  recall:  0.7714932126696833  F_score:  0.5063103192279139 0.5063103192279139
TP:  308 / 442 all  779  accuracy:  0.7254990925589837  precision:  0.39537869062901154  recall:  0.6968325791855203  F_score:  0.5045045045045045 0.5045045045045045
TP:  310 / 442 all  789  accuracy:  0.7227767695099818  precision:  0.3929024081115336  recall:  0.7013574660633484  F_score:  0.503655564581641 0.503655564581641
TP:  311 / 442 all  791  accuracy:  0.7227767695099818  precision:  0.393173198482933  recall:  0.7036199095022625  F_score:  0.5044606650446067 0.5044606650446067
TP:  311 / 442 all  791  accuracy:  0.7227767695099818  precision:  0.393173198482933  recall:  0.7036199095022625  F_score:  0.5044606650446067 0.5044606650446067
TP:  310 / 442 all  794  accuracy:  0.720508166969147  precision:  0.3904282115869018  recall:  0.7013574660633484  F_score:  0.5016181229773463 0.5016181229773463
**************************************************
Fire!
middle  and  original 
TP:  315 / 442 all  800  accuracy:  0.7223230490018149  precision:  0.39375  recall:  0.7126696832579186  F_score:  0.5072463768115942 0.5072463768115942
TP:  314 / 442 all  800  accuracy:  0.721415607985481  precision:  0.3925  recall:  0.7104072398190046  F_score:  0.5056360708534622 0.5056360708534622
final and  original 
TP:  313 / 442 all  797  accuracy:  0.7218693284936479  precision:  0.39272271016311167  recall:  0.7081447963800905  F_score:  0.5052461662631155 0.5052461662631155
TP:  312 / 442 all  797  accuracy:  0.7209618874773139  precision:  0.39146800501882056  recall:  0.7058823529411765  F_score:  0.5036319612590798 0.5036319612590798
final and  middle and original 
TP:  315 / 442 all  807  accuracy:  0.719147005444646  precision:  0.3903345724907063  recall:  0.7126696832579186  F_score:  0.5044035228182546 0.5044035228182546
TP:  314 / 442 all  804  accuracy:  0.7196007259528131  precision:  0.39054726368159204  recall:  0.7104072398190046  F_score:  0.5040128410914928 0.5040128410914928
TP:  318 / 442 all  811  accuracy:  0.72005444646098  precision:  0.3921085080147966  recall:  0.7194570135746606  F_score:  0.5075818036711892 0.5075818036711892
TP:  317 / 442 all  815  accuracy:  0.7173321234119783  precision:  0.3889570552147239  recall:  0.7171945701357466  F_score:  0.5043754972155926 0.5043754972155926
Random Seed is:  10
5 fold no feature engineering
TP:  332 / 438 all  914  accuracy:  0.6878402903811253  precision:  0.36323851203501095  recall:  0.7579908675799086  F_score:  0.4911242603550296 0.4911242603550296
TP:  319 / 438 all  816  accuracy:  0.720508166969147  precision:  0.3909313725490196  recall:  0.728310502283105  F_score:  0.5087719298245613 0.5087719298245613
TP:  328 / 438 all  849  accuracy:  0.7137023593466425  precision:  0.38633686690223795  recall:  0.7488584474885844  F_score:  0.5097125097125097 0.5097125097125097
TP:  326 / 438 all  837  accuracy:  0.7173321234119783  precision:  0.38948626045400236  recall:  0.7442922374429224  F_score:  0.5113725490196078 0.5113725490196078
TP:  326 / 438 all  838  accuracy:  0.7168784029038112  precision:  0.38902147971360385  recall:  0.7442922374429224  F_score:  0.5109717868338558 0.5109717868338558
TP:  326 / 438 all  846  accuracy:  0.7132486388384754  precision:  0.38534278959810875  recall:  0.7442922374429224  F_score:  0.5077881619937694 0.5077881619937694
**************************************************
5 fold feature engineering middle
TP:  347 / 438 all  931  accuracy:  0.6937386569872959  precision:  0.3727175080558539  recall:  0.7922374429223744  F_score:  0.5069393718042365 0.5069393718042365
TP:  324 / 438 all  817  accuracy:  0.7245916515426497  precision:  0.39657282741738065  recall:  0.7397260273972602  F_score:  0.5163346613545816 0.5163346613545816
TP:  337 / 438 all  857  accuracy:  0.7182395644283122  precision:  0.39323220536756126  recall:  0.769406392694064  F_score:  0.5204633204633206 0.5204633204633206
TP:  331 / 438 all  841  accuracy:  0.72005444646098  precision:  0.3935790725326992  recall:  0.7557077625570776  F_score:  0.5175918686473807 0.5175918686473807
TP:  332 / 438 all  841  accuracy:  0.7209618874773139  precision:  0.3947681331747919  recall:  0.7579908675799086  F_score:  0.5191555903049258 0.5191555903049258
TP:  338 / 438 all  852  accuracy:  0.721415607985481  precision:  0.3967136150234742  recall:  0.771689497716895  F_score:  0.524031007751938 0.524031007751938
**************************************************
5 fold feature engineering final
TP:  348 / 438 all  936  accuracy:  0.6923774954627949  precision:  0.3717948717948718  recall:  0.7945205479452054  F_score:  0.5065502183406113 0.5065502183406113
TP:  331 / 438 all  818  accuracy:  0.7304900181488203  precision:  0.40464547677261614  recall:  0.7557077625570776  F_score:  0.5270700636942675 0.5270700636942675
TP:  333 / 438 all  833  accuracy:  0.7254990925589837  precision:  0.3997599039615846  recall:  0.7602739726027398  F_score:  0.5239968528717546 0.5239968528717546
TP:  334 / 438 all  826  accuracy:  0.7295825771324864  precision:  0.4043583535108959  recall:  0.7625570776255708  F_score:  0.5284810126582279 0.5284810126582279
TP:  335 / 438 all  824  accuracy:  0.7313974591651543  precision:  0.4065533980582524  recall:  0.7648401826484018  F_score:  0.5309033280507132 0.5309033280507132
TP:  337 / 438 all  838  accuracy:  0.7268602540834845  precision:  0.4021479713603819  recall:  0.769406392694064  F_score:  0.5282131661442007 0.5282131661442007
**************************************************
Fire!
middle  and  original 
TP:  330 / 438 all  830  accuracy:  0.7241379310344828  precision:  0.39759036144578314  recall:  0.7534246575342466  F_score:  0.5205047318611987 0.5205047318611987
TP:  329 / 438 all  827  accuracy:  0.7245916515426497  precision:  0.3978234582829504  recall:  0.7511415525114156  F_score:  0.5201581027667985 0.5201581027667985
final and  original 
TP:  333 / 438 all  834  accuracy:  0.7250453720508166  precision:  0.39928057553956836  recall:  0.7602739726027398  F_score:  0.5235849056603774 0.5235849056603774
TP:  333 / 438 all  832  accuracy:  0.7259528130671506  precision:  0.40024038461538464  recall:  0.7602739726027398  F_score:  0.5244094488188976 0.5244094488188976
final and  middle and original 
TP:  333 / 438 all  827  accuracy:  0.7282214156079855  precision:  0.4026602176541717  recall:  0.7602739726027398  F_score:  0.5264822134387351 0.5264822134387351
TP:  335 / 438 all  825  accuracy:  0.7309437386569873  precision:  0.40606060606060607  recall:  0.7648401826484018  F_score:  0.5304829770387964 0.5304829770387964
TP:  336 / 438 all  831  accuracy:  0.7291288566243194  precision:  0.4043321299638989  recall:  0.7671232876712328  F_score:  0.5295508274231678 0.5295508274231678
TP:  337 / 438 all  841  accuracy:  0.7254990925589837  precision:  0.40071343638525564  recall:  0.769406392694064  F_score:  0.5269741985926505 0.5269741985926505
Random Seed is:  100
5 fold no feature engineering
TP:  327 / 430 all  905  accuracy:  0.691016333938294  precision:  0.36132596685082874  recall:  0.7604651162790698  F_score:  0.4898876404494382 0.4898876404494382
TP:  307 / 430 all  810  accuracy:  0.7159709618874773  precision:  0.3790123456790123  recall:  0.713953488372093  F_score:  0.4951612903225806 0.4951612903225806
TP:  308 / 430 all  801  accuracy:  0.7209618874773139  precision:  0.38451935081148564  recall:  0.7162790697674418  F_score:  0.5004061738424046 0.5004061738424046
TP:  309 / 430 all  807  accuracy:  0.719147005444646  precision:  0.3828996282527881  recall:  0.7186046511627907  F_score:  0.4995957962813258 0.4995957962813258
TP:  309 / 430 all  807  accuracy:  0.719147005444646  precision:  0.3828996282527881  recall:  0.7186046511627907  F_score:  0.4995957962813258 0.4995957962813258
TP:  311 / 430 all  815  accuracy:  0.7173321234119783  precision:  0.3815950920245399  recall:  0.7232558139534884  F_score:  0.4995983935742972 0.4995983935742972
**************************************************
5 fold feature engineering middle
TP:  333 / 430 all  923  accuracy:  0.6882940108892922  precision:  0.3607800650054171  recall:  0.7744186046511627  F_score:  0.49223946784922396 0.49223946784922396
TP:  308 / 430 all  825  accuracy:  0.7100725952813067  precision:  0.37333333333333335  recall:  0.7162790697674418  F_score:  0.4908366533864541 0.4908366533864541
TP:  297 / 430 all  772  accuracy:  0.7241379310344828  precision:  0.38471502590673573  recall:  0.6906976744186046  F_score:  0.4941763727121463 0.4941763727121463
TP:  307 / 430 all  793  accuracy:  0.7236842105263158  precision:  0.3871374527112232  recall:  0.713953488372093  F_score:  0.5020441537203597 0.5020441537203597
TP:  307 / 430 all  794  accuracy:  0.7232304900181489  precision:  0.3866498740554156  recall:  0.713953488372093  F_score:  0.5016339869281046 0.5016339869281046
TP:  308 / 430 all  806  accuracy:  0.7186932849364791  precision:  0.38213399503722084  recall:  0.7162790697674418  F_score:  0.49838187702265374 0.49838187702265374
**************************************************
5 fold feature engineering final
TP:  335 / 430 all  930  accuracy:  0.6869328493647913  precision:  0.3602150537634409  recall:  0.7790697674418605  F_score:  0.4926470588235293 0.4926470588235293
TP:  315 / 430 all  842  accuracy:  0.7087114337568058  precision:  0.37410926365795727  recall:  0.7325581395348837  F_score:  0.4952830188679246 0.4952830188679246
TP:  315 / 430 all  842  accuracy:  0.7087114337568058  precision:  0.37410926365795727  recall:  0.7325581395348837  F_score:  0.4952830188679246 0.4952830188679246
TP:  315 / 430 all  845  accuracy:  0.707350272232305  precision:  0.3727810650887574  recall:  0.7325581395348837  F_score:  0.4941176470588236 0.4941176470588236
TP:  315 / 430 all  845  accuracy:  0.707350272232305  precision:  0.3727810650887574  recall:  0.7325581395348837  F_score:  0.4941176470588236 0.4941176470588236
TP:  318 / 430 all  856  accuracy:  0.70508166969147  precision:  0.37149532710280375  recall:  0.7395348837209302  F_score:  0.4945567651632971 0.4945567651632971
**************************************************
Fire!
middle  and  original 
TP:  305 / 430 all  797  accuracy:  0.72005444646098  precision:  0.38268506900878296  recall:  0.7093023255813954  F_score:  0.49714751426242876 0.49714751426242876
TP:  307 / 430 all  801  accuracy:  0.72005444646098  precision:  0.383270911360799  recall:  0.713953488372093  F_score:  0.4987814784727863 0.4987814784727863
final and  original 
TP:  311 / 430 all  825  accuracy:  0.7127949183303085  precision:  0.37696969696969695  recall:  0.7232558139534884  F_score:  0.4956175298804781 0.4956175298804781
TP:  312 / 430 all  825  accuracy:  0.7137023593466425  precision:  0.3781818181818182  recall:  0.7255813953488373  F_score:  0.49721115537848604 0.49721115537848604
final and  middle and original 
TP:  307 / 430 all  815  accuracy:  0.7137023593466425  precision:  0.37668711656441717  recall:  0.713953488372093  F_score:  0.4931726907630522 0.4931726907630522
TP:  307 / 430 all  814  accuracy:  0.7141560798548094  precision:  0.37714987714987713  recall:  0.713953488372093  F_score:  0.4935691318327974 0.4935691318327974
TP:  311 / 430 all  824  accuracy:  0.7132486388384754  precision:  0.3774271844660194  recall:  0.7232558139534884  F_score:  0.49601275917065385 0.49601275917065385
TP:  313 / 430 all  833  accuracy:  0.7109800362976406  precision:  0.375750300120048  recall:  0.727906976744186  F_score:  0.49564528899445764 0.49564528899445764
Random Seed is:  1000
5 fold no feature engineering
TP:  304 / 404 all  901  accuracy:  0.6837568058076225  precision:  0.3374028856825749  recall:  0.7524752475247525  F_score:  0.4659003831417624 0.4659003831417624
TP:  294 / 404 all  801  accuracy:  0.72005444646098  precision:  0.36704119850187267  recall:  0.7277227722772277  F_score:  0.4879668049792531 0.4879668049792531
TP:  290 / 404 all  769  accuracy:  0.7309437386569873  precision:  0.37711313394018203  recall:  0.7178217821782178  F_score:  0.49445865302642794 0.49445865302642794
TP:  290 / 404 all  786  accuracy:  0.7232304900181489  precision:  0.36895674300254455  recall:  0.7178217821782178  F_score:  0.4873949579831934 0.4873949579831934
TP:  289 / 404 all  788  accuracy:  0.721415607985481  precision:  0.366751269035533  recall:  0.7153465346534653  F_score:  0.4848993288590604 0.4848993288590604
TP:  290 / 404 all  796  accuracy:  0.7186932849364791  precision:  0.36432160804020103  recall:  0.7178217821782178  F_score:  0.48333333333333345 0.48333333333333345
**************************************************
5 fold feature engineering middle
TP:  315 / 404 all  924  accuracy:  0.6833030852994555  precision:  0.3409090909090909  recall:  0.7797029702970297  F_score:  0.4743975903614458 0.4743975903614458
TP:  298 / 404 all  831  accuracy:  0.7100725952813067  precision:  0.358604091456077  recall:  0.7376237623762376  F_score:  0.48259109311740883 0.48259109311740883
TP:  291 / 404 all  795  accuracy:  0.72005444646098  precision:  0.3660377358490566  recall:  0.7202970297029703  F_score:  0.48540450375312766 0.48540450375312766
TP:  295 / 404 all  820  accuracy:  0.7123411978221416  precision:  0.3597560975609756  recall:  0.7301980198019802  F_score:  0.4820261437908497 0.4820261437908497
TP:  297 / 404 all  822  accuracy:  0.7132486388384754  precision:  0.3613138686131387  recall:  0.7351485148514851  F_score:  0.4845024469820554 0.4845024469820554
TP:  298 / 404 all  833  accuracy:  0.7091651542649727  precision:  0.3577430972388956  recall:  0.7376237623762376  F_score:  0.48181083265966046 0.48181083265966046
**************************************************
5 fold feature engineering final
TP:  316 / 404 all  919  accuracy:  0.6864791288566243  precision:  0.3438520130576714  recall:  0.7821782178217822  F_score:  0.47770219198790626 0.47770219198790626
TP:  296 / 404 all  825  accuracy:  0.7109800362976406  precision:  0.35878787878787877  recall:  0.7326732673267327  F_score:  0.4816924328722539 0.4816924328722539
TP:  301 / 404 all  813  accuracy:  0.7209618874773139  precision:  0.37023370233702335  recall:  0.745049504950495  F_score:  0.49465899753492193 0.49465899753492193
TP:  302 / 404 all  825  accuracy:  0.7164246823956443  precision:  0.3660606060606061  recall:  0.7475247524752475  F_score:  0.49145646867371856 0.49145646867371856
TP:  300 / 404 all  824  accuracy:  0.7150635208711433  precision:  0.3640776699029126  recall:  0.7425742574257426  F_score:  0.48859934853420195 0.48859934853420195
TP:  301 / 404 all  831  accuracy:  0.7127949183303085  precision:  0.3622141997593261  recall:  0.745049504950495  F_score:  0.4874493927125506 0.4874493927125506
**************************************************
Fire!
middle  and  original 
TP:  294 / 404 all  809  accuracy:  0.7164246823956443  precision:  0.36341161928306553  recall:  0.7277227722772277  F_score:  0.4847485572959604 0.4847485572959604
TP:  294 / 404 all  810  accuracy:  0.7159709618874773  precision:  0.362962962962963  recall:  0.7277227722772277  F_score:  0.48434925864909395 0.48434925864909395
final and  original 
TP:  298 / 404 all  809  accuracy:  0.72005444646098  precision:  0.3683559950556242  recall:  0.7376237623762376  F_score:  0.49134377576257215 0.49134377576257215
TP:  298 / 404 all  811  accuracy:  0.719147005444646  precision:  0.36744759556103573  recall:  0.7376237623762376  F_score:  0.4905349794238683 0.4905349794238683
final and  middle and original 
TP:  300 / 404 all  816  accuracy:  0.7186932849364791  precision:  0.36764705882352944  recall:  0.7425742574257426  F_score:  0.4918032786885246 0.4918032786885246
TP:  302 / 404 all  821  accuracy:  0.7182395644283122  precision:  0.36784409257003653  recall:  0.7475247524752475  F_score:  0.49306122448979584 0.49306122448979584
TP:  303 / 404 all  830  accuracy:  0.7150635208711433  precision:  0.3650602409638554  recall:  0.75  F_score:  0.4910858995137763 0.4910858995137763
TP:  304 / 404 all  837  accuracy:  0.7127949183303085  precision:  0.3632019115890084  recall:  0.7524752475247525  F_score:  0.4899274778404513 0.4899274778404513

实验小结

从上面的实验中,我们可以得到如下的结论:

  1. 采用集成的方式,我们的模型在所有的训练集上都可以获得稳定的提升(线上线下是一致的情况);
  2. 模型的结果波动有些大,大概在0.49-0.52之间波动,因为数据少,所以随机性会较大,较为合理;
  3. 实验证明我们多个子集以及N折交叉的结果相比于单个模型的结果不仅效果好,而且稳定很多(集成大概会提升0.01-0.02个点)

总结与展望

总结

本次比赛难点以及解决方案

因为本次赛事的数据特征都相对较好,但是因为数据的个数相对较少,所以我们的重点就落在了下面的三个问题上:

  1. 如何进一步挖掘更好的特征
  2. 如何在少量数据的问题上提升模型的性能同时提高模型的鲁棒性
  3. 如何优化这种不可以直接求导的目标函数

而针对上面的三个问题,我们给出了如下的解决方案:

  1. 我们尽可能提取有意义的特征同时将 :最后我们给出了新的5类特征:1.提高模型的表达能力的特征¶;2.比例特征;3.标准差还原特征(反映信息的波动);4.均值特征;5.趋势特征;具体的细节可以参考特征的构建部分。
  2. 我们采用多个模型集成的方式来提升模型的性能同时提高模型的鲁棒性;这些又由下面两个模块组成:多个不同训练集的模型融合+多个不同模型的融合
  3. 优化F-score一共有3种常见的方法,包括加权,转化为类别不平衡的问题;优化F-score的下界或者上届近似函数;设置阈值等,此处我们为了方便,选择直接使用设置阈值的方式进行。

方案总结

模型的优点:

  1. 鲁棒性好,稳定,性能相对不错;
  2. 模型的架构较为完善,可以获得稳步提升;
  3. 给出了较好的特征构建的思路;

展望

本次比赛还可以进行进一步的完善,我们将其总结如下:

  1. 尽可能扩展数据集(本质)
  2. 调整模型的参数,因为数据集相对较小的情况下,参数的影响会相对较大,所以参数的调整往往也可以带来较大的收获;
  3. 提取一些高质量特征,例如一些差值特征等,贷款与还款的差值等等,相信这些特征从多个角度对模型带来帮助。

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值