来源:老师 - kesci.com,微信公众号:Kaggle竞赛宝典
原文链接:银联“信贷用户逾期预测”算法大赛总结
点击以上链接👆 不用配置环境,直接在线运行
背景介绍
个人信用是整个社会信用的基础,市场交易中所有的经济活动,与个人信用息息相关。一旦个人行为失之约束,就会发生个人失信行为,进而出现集体失信。因此,个人信用体系建设具有极其重要的意义,然而随着经济的发展,越来越重要的信用记录与信用记录的缺失之间的矛盾日益激化,建立完善的信用体系迫在眉睫。随着近年来面向个人的小额贷款业务的不断发展,防范个人信贷欺诈,降低不良率是开展相关业务的首要目标。本届大赛旨在利用上述大数据和人工智能、机器学习相关技术,调动社会全员的大数据建模创新积极性,帮助金融机构准确评估个人信用情况,进一步提高信贷风险防范能力。
本届大赛的主题为“开放融合,共建信用”,赛题为“信贷用户逾期预测”,由参赛选手完成大数据算法模型的开发设计,实现对小额信贷业务申请个人欺诈和逾期风险的精准识别,进一步提升金融机构防范欺诈和降低不良率的能力。
工具包&数据的导入&数据EDA
工具包的导入
import pandas as pd
from sklearn.metrics import fbeta_score
from sklearn.model_selection import train_test_split
from sklearn.cross_validation import KFold
数据的导入
model_sample = pd.read_csv('./Data/model_sample.csv')
model_sample.set_index('user_id',inplace=True)
label = model_sample[['y']]
model_sample = model_sample.drop('y',axis=1)
数据EDA
- 数据中存在较多的缺失值,需要进行特殊处理
- 数据的绝大部分维度特征相对合理,年龄什么的都符合要求,未出现离谱的特征;虽然像申请贷款笔数,成功申请贷款笔数(x_198,x_199)的存在一些相对比较夸张的数值,181,132等,但因为缺少先验信息,所以默认也算合理
- 数据中只存在int和float类型的变量,数据格式相对简单,所以无需进行额外处理
- 数据的个数只有11017个,属于低样本的情况,所以此处选择模型时需要注意过拟合
model_sample.head()
model_sample.describe()
model_sample.info()
<class 'pandas.core.frame.DataFrame'>
Index: 11017 entries, A00002 to A21941
Columns: 199 entries, x_001 to x_199
dtypes: float64(161), int64(38)
memory usage: 16.8+ MB
此处字段都有含义,细节可以参考"字段解释.xlsx"
- 数据的个数只有11017个,属于低样本的情况
model_sample.shape
(11017, 199)
特征的构建
我们此处对特征进行构建,构建的思路是两个:
- 基于模型的表达能力,以及需要什么样的特征,将原先的特征重新进行编码(特征转换)
- 对特征进行组合构建,构建更多具有表示能力的特征
构建特征集合1
基于模型的表达能力的特征
- 将身份信息以及财产信息进行编码,0-1进行组合编码(提高模型的表示能力)
比例特征构建
- 借记卡的比例特征(各种借记卡所占的比例)
- 贷记卡的比例特征(各种贷记卡所占的比例)
- 银行卡的比例特征(各种银行卡所占的比例)
- 失败还款笔数占比
- 失败申请贷款的占比
标准差还原特征(反映信息的波动)
- 将数据中的标准差进行还原
均值特征
- 每张卡(例如信用)交易金额等;
- 每笔(例如异地每笔)交易金额等;
- 每笔还款金额等
- 每笔商旅,保险,家装,金融等的均值特征
- 每个月的平均交易笔数
- 每个月的交易金额
- 每笔放款金额,每个机构的放款笔数,每个机构的放款金额
- 每个机构的平均还款金额,每个机构的
- 每个机构的贷款金额
- 其他均值特征
趋势特征
- 90天与30天的申请贷款机构的趋势,180天与90天的申请贷款机构的趋势,180天与30天的申请贷款机构的趋势
- 90天与30天的成功申请贷款机构的趋势,180天与90天的成功申请贷款机构的趋势,180天与30天的成功申请贷款机构的趋势
- 90天与30天的申请贷款笔数的趋势,180天与90天的申请贷款笔数的趋势,180天与30天的申请贷款笔数的趋势
def get_features_middle(data):
model_sample_strong_feature = data.copy()
# 将身份信息以及财产信息进行编码
first_strong_features = ['x_003','x_004','x_005','x_006','x_007','x_008','x_009','x_010','x_011','x_012','x_013','x_014','x_015','x_016','x_017','x_018','x_019']
res = 0
for i in range(len(first_strong_features)):
res += 2 ** i * data[first_strong_features[i]]
model_sample_strong_feature['x_1_strong'] = res
# 借记卡的比例特征
model_sample_strong_feature['x_022/x_020'] = data['x_022'] / (data['x_020'] + 1e-10)
...
model_sample_strong_feature['x_026/x_020'] = data['x_026'] / (data['x_020'] + 1e-10)
# 贷记卡的比例特征
model_sample_strong_feature['x_028/x_021'] = data['x_028'] / (data['x_021'] + 1e-10)
...
model_sample_strong_feature['x_032/x_021'] = data['x_032'] / (data['x_021'] + 1e-10)
# 银行卡的比例特征
model_sample_strong_feature['all_cards'] = (data['x_034'] + data['x_035'] + data['x_036'] + data['x_037'] + data['x_038'] + data['x_039'] + data['x_040'] ).values
model_sample_strong_feature['x_034/all_cards'] = data['x_034'] / (model_sample_strong_feature['all_cards'] + 1e-10)
...
model_sample_strong_feature['x_040/all_cards'] = data['x_040'] / (model_sample_strong_feature['all_cards'] + 1e-10)
# 标准差还原
model_sample_strong_feature['x_043/x_044'] = data['x_043'] / (data['x_044'] + 1e-10)
...
model_sample_strong_feature['x_126/x_127'] = data['x_126'] / (data['x_127'] + 1e-10)
# 每张卡(信用or其他)交易金额等;每笔(异地每笔)交易金额等;每笔还款金额等;每笔商旅,保险,家装,金融等的均值特征;每个月的平均交易笔数;其他有意义的均值特征
model_sample_strong_feature['x_045/x_41'] = data['x_045'] / (data['x_041'] + 1e-10)
...
model_sample_strong_feature['x_130/x_128'] = data['x_130'] / (data['x_128'] + 1e-10)
# 每笔放款金额,每个机构的放款笔数,每个机构的放款金额
model_sample_strong_feature['x_133/x_134'] = data['x_133'] / (data['x_134'] + 1e-10)
...
model_sample_strong_feature['x_144/x_142'] = data['x_144'] / (data['x_142'] + 1e-10)
# 每个机构的放款均值,失败还款笔数占比
model_sample_strong_feature['x_151/x_149'] = data['x_151'] / (data['x_149'] + 1e-10)
...
model_sample_strong_feature['x_185/x_180'] = data['x_185'] / (data['x_180'] + 1e-10)
# 90天与30天的申请贷款机构的趋势,180天与90天的申请贷款机构的趋势,180天与30天的申请贷款机构的趋势;90天与30天的成功申请贷款机构的趋势,180天与90天的成功申请贷款机构的趋势,180天;
# 30天的成功申请贷款机构的趋势;90天与30天的申请贷款笔数的趋势,180天与90天的申请贷款笔数的趋势,180天与30天的申请贷款笔数的趋势90天的申请贷款笔数的趋势
model_sample_strong_feature['x_189/x_188'] = data['x_189'] / (data['x_188'] + 1e-10)
...
model_sample_strong_feature['x_192/x_188'] = data['x_192'] / (data['x_188'] + 1e-10)
model_sample_strong_feature = model_sample_strong_feature.fillna(-999)
return model_sample_strong_feature
构建特征集合2
- 该集合是上面集合的一个子集,用作融合使用(该集合在5折的时候)
def get_features_final(data):
model_sample_strong_feature = data.copy()
first_strong_features = ['x_003','x_004','x_005','x_006','x_007','x_008','x_009','x_010','x_011','x_012','x_013','x_014','x_015','x_016','x_017','x_018','x_019']
res = 0
for i in range(len(first_strong_features)):
res += 2 ** i * data[first_strong_features[i]]
model_sample_strong_feature['x_1_strong'] = res
model_sample_strong_feature['x_022/x_020'] = data['x_022'] / (data['x_020'] + 1e-10)
...
model_sample_strong_feature['x_032/x_021'] = data['x_032'] / (data['x_021'] + 1e-10)
model_sample_strong_feature['all_cards'] = (data['x_034'] + data['x_035'] + data['x_036'] + data['x_037'] + data['x_038'] + data['x_039'] + data['x_040']).values
model_sample_strong_feature['x_034/all_cards'] = data['x_034'] / (model_sample_strong_feature['all_cards'] + 1e-10)
...
model_sample_strong_feature['x_040/all_cards'] = data['x_040'] / (model_sample_strong_feature['all_cards'] + 1e-10)
model_sample_strong_feature['x_027/x_033'] = data['x_027'] - (data['x_033'] + 1e-10)
...
model_sample_strong_feature['x_192/x_188'] = data['x_192'] / (data['x_188'] + 1e-10)
model_sample_strong_feature = model_sample_strong_feature.fillna(-999)
return model_sample_strong_feature
模型训练
指标的构建
def get_score(y_pred, y_true):
acc_ = accuracy_score(y_true=y_true,y_pred=y_pred)
TP = np.sum(((y_pred == 1) & (y_true == 1)))
precision = TP / np.sum(y_pred)
recall = TP / np.sum(y_true)
print('TP: ',TP,'/', np.sum(y_true), 'all ',np.sum(y_pred), ' accuracy: ',acc_, ' precision: ',precision, ' recall: ',recall, ' F_score: ', 2 * precision * recall / (precision + recall),fbeta_score(y_true=y_true,y_pred=y_pred,beta=1) )
获取topN重要的特征
防止模型过拟合,同时以我们的经验判断,用于融合也可以带来较好的提升。
- 因为训练时间的问题,我们最终的模型没有加入该类融合
def get_top_features(feature,model,topN):
feature_importance = pd.DataFrame({'feature':feature,'importance':model.feature_importance()})
feature_importance = feature_importance.sort_values('importance',ascending=False)
feature_importance = feature_importance.loc[feature_importance['importance'] > 0]
if feature_importance.shape[0] >= topN:
return feature_importance['feature'][:topN]
else:
return feature_importance['feature']
模型训练与验证
因为原始的数据只有11000多个,数据相对较少,为了防止模型过拟合,我们选择采用两种融合的方法
- 多个不同数据子集进行模型的训练
- 多个模型融合
def N_Fold_Predict( train_fea , train_y, test_fea, cv_ = 5):
###########################################################
train_fea = train_fea.fillna(-1)
test_fea = test_fea.fillna(-1)
features_col = [c for c in train_fea.columns if c not in ['user_id','y']]
X = train_fea[features_col]
X_pred = test_fea[features_col]
pred_out_lgb = 0
pred_out_gbdt = 0
pred_out_rf = 0
for cv in range(cv_):
X_train, X_test, y_train, y_test = train_test_split(X, train_y, test_size=0.25, random_state=np.random.randint(1000))
# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# specify your configurations as a dict
params = {
'task':'train',
'boosting_type':'gbdt',
'num_leaves': 31,
'objective': 'binary',
'learning_rate': 0.05,
'bagging_freq': 2,
'max_bin':256,
'num_threads': 32
}
# train
gbm = lgb.train(params,
lgb_train,
verbose_eval= 0,
num_boost_round=10000,
valid_sets=lgb_eval,
early_stopping_rounds=100)
lgb_pred = gbm.predict(X_pred, num_iteration=gbm.best_iteration)
gbdt = GradientBoostingClassifier(n_estimators=250,learning_rate=0.01,max_depth=6,min_samples_leaf=5,min_samples_split=5)
gbdt.fit(X_train, y_train)
gbdt_pred = gbdt.predict_proba(X_pred)[:,1]
rf = RandomForestClassifier(n_estimators=500,max_depth=6,min_samples_leaf=5,min_samples_split=5)
rf.fit(X_train, y_train)
rf_pred = rf.predict_proba(X_pred)[:,1]
if cv == 0:
pred_out_lgb = lgb_pred
pred_out_gbdt = gbdt_pred
pred_out_rf = rf_pred
else:
pred_out_lgb += lgb_pred
pred_out_gbdt += gbdt_pred
pred_out_rf += rf_pred
pred_out_lgb = pred_out_lgb * 1.0 / cv_
pred_out_gbdt = pred_out_gbdt * 1.0 / cv_
pred_out_rf = pred_out_rf * 1.0 / cv_
return pred_out_lgb, pred_out_gbdt, pred_out_rf
获取不同子集数据集
model_sample_strong_feature_middle = get_features_middle(model_sample)
model_sample_strong_feature_final = get_features_final(model_sample)
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)
多个模型训练与测试
- 解决F指标问题,我们选择采用调整阈值的方式,此处我们选择0.215作为我们最后的阈值(还可以调整权重等)
- 为了避免模型的过拟合,我们选择多个模型融合的方式(此处采用简单的加权融合的方式进行,因为stacking比较耗时,可能会超过30分钟)
for rnd in [1,10,100,1000]:
print('Random Seed is: ',rnd)
train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd)
train_X_orig = model_sample_.loc[train_X.index]
test_X_orig = model_sample_.loc[test_X.index]
train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
print('5 fold no feature engineering')
pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
pred =pred_out_rf >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.215
get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering middle')
pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
pred = pred_out_rf_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.215
get_score(pred, test_y['y'].values)
# pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.215
# get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering final')
pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 3)
pred =pred_out_rf_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.215
get_score(pred, test_y['y'].values)
print('*' * 50)
print('Fire!')
print('middle and original ')
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
get_score(pred, test_y['y'].values)
print('final and original ')
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
get_score(pred, test_y['y'].values)
print('final and middle and original ')
pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55 + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
Random Seed is: 1000
5 fold no feature engineering
TP: 297 / 404 all 871 accuracy: 0.691016333938294 precision: 0.3409873708381171 recall: 0.7351485148514851 F_score: 0.46588235294117647 0.46588235294117647
TP: 285 / 404 all 765 accuracy: 0.7282214156079855 precision: 0.37254901960784315 recall: 0.7054455445544554 F_score: 0.4875962360992301 0.4875962360992301
TP: 287 / 404 all 766 accuracy: 0.7295825771324864 precision: 0.37467362924281983 recall: 0.7103960396039604 F_score: 0.4905982905982905 0.4905982905982905
TP: 293 / 404 all 773 accuracy: 0.7318511796733213 precision: 0.37904269081500647 recall: 0.7252475247524752 F_score: 0.49787595581988103 0.49787595581988103
TP: 293 / 404 all 774 accuracy: 0.7313974591651543 precision: 0.3785529715762274 recall: 0.7252475247524752 F_score: 0.49745331069609505 0.49745331069609505
TP: 294 / 404 all 780 accuracy: 0.7295825771324864 precision: 0.3769230769230769 recall: 0.7277227722772277 F_score: 0.49662162162162166 0.49662162162162166
**************************************************
5 fold feature engineering middle
TP: 317 / 404 all 927 accuracy: 0.6837568058076225 precision: 0.3419633225458468 recall: 0.7846534653465347 F_score: 0.47633358377160034 0.47633358377160034
TP: 300 / 404 all 824 accuracy: 0.7150635208711433 precision: 0.3640776699029126 recall: 0.7425742574257426 F_score: 0.48859934853420195 0.48859934853420195
TP: 298 / 404 all 824 accuracy: 0.7132486388384754 precision: 0.3616504854368932 recall: 0.7376237623762376 F_score: 0.485342019543974 0.485342019543974
TP: 304 / 404 all 839 accuracy: 0.7118874773139746 precision: 0.36233611442193087 recall: 0.7524752475247525 F_score: 0.4891391794046661 0.4891391794046661
TP: 304 / 404 all 842 accuracy: 0.7105263157894737 precision: 0.36104513064133015 recall: 0.7524752475247525 F_score: 0.48796147672552165 0.48796147672552165
TP: 306 / 404 all 842 accuracy: 0.7123411978221416 precision: 0.36342042755344417 recall: 0.7574257425742574 F_score: 0.4911717495987159 0.4911717495987159
TP: 305 / 404 all 841 accuracy: 0.7118874773139746 precision: 0.3626634958382878 recall: 0.754950495049505 F_score: 0.48995983935742976 0.48995983935742976
**************************************************
5 fold feature engineering final
TP: 316 / 404 all 913 accuracy: 0.6892014519056261 precision: 0.34611171960569553 recall: 0.7821782178217822 F_score: 0.4798785117691724 0.4798785117691724
TP: 299 / 404 all 830 accuracy: 0.7114337568058077 precision: 0.3602409638554217 recall: 0.7400990099009901 F_score: 0.48460291734197725 0.48460291734197725
TP: 291 / 404 all 795 accuracy: 0.72005444646098 precision: 0.3660377358490566 recall: 0.7202970297029703 F_score: 0.48540450375312766 0.48540450375312766
TP: 294 / 404 all 808 accuracy: 0.7168784029038112 precision: 0.36386138613861385 recall: 0.7277227722772277 F_score: 0.4851485148514852 0.4851485148514852
TP: 296 / 404 all 809 accuracy: 0.7182395644283122 precision: 0.3658838071693449 recall: 0.7326732673267327 F_score: 0.48804616652926625 0.48804616652926625
TP: 300 / 404 all 820 accuracy: 0.7168784029038112 precision: 0.36585365853658536 recall: 0.7425742574257426 F_score: 0.49019607843137253 0.49019607843137253
**************************************************
Fire!
middle and original
TP: 299 / 404 all 806 accuracy: 0.7223230490018149 precision: 0.3709677419354839 recall: 0.7400990099009901 F_score: 0.4942148760330578 0.4942148760330578
TP: 299 / 404 all 809 accuracy: 0.7209618874773139 precision: 0.3695920889987639 recall: 0.7400990099009901 F_score: 0.49299258037922505 0.49299258037922505
final and original
TP: 298 / 404 all 810 accuracy: 0.7196007259528131 precision: 0.36790123456790125 recall: 0.7376237623762376 F_score: 0.49093904448105447 0.49093904448105447
TP: 298 / 404 all 810 accuracy: 0.7196007259528131 precision: 0.36790123456790125 recall: 0.7376237623762376 F_score: 0.49093904448105447 0.49093904448105447
final and middle and original
TP: 301 / 404 all 819 accuracy: 0.7182395644283122 precision: 0.36752136752136755 recall: 0.745049504950495 F_score: 0.4922322158626328 0.4922322158626328
TP: 301 / 404 all 822 accuracy: 0.7168784029038112 precision: 0.3661800486618005 recall: 0.745049504950495 F_score: 0.49102773246329523 0.49102773246329523
TP: 302 / 404 all 828 accuracy: 0.7150635208711433 precision: 0.3647342995169082 recall: 0.7475247524752475 F_score: 0.4902597402597403 0.4902597402597403
TP: 303 / 404 all 838 accuracy: 0.7114337568058077 precision: 0.3615751789976134 recall: 0.75 F_score: 0.48792270531400966 0.48792270531400966
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)
for rnd in [1,10,100,1000]:
print('Random Seed is: ',rnd)
train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd)
train_X_orig = model_sample_.loc[train_X.index]
test_X_orig = model_sample_.loc[test_X.index]
train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
print('5 fold no feature engineering')
pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
pred =pred_out_rf >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.23
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.23
get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering middle')
pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
pred = pred_out_rf_middle >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_middle >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.23
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.23
get_score(pred, test_y['y'].values)
# pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.23
# get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering final')
pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 3)
pred =pred_out_rf_final >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_final >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final >= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.23
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.23
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.23
get_score(pred, test_y['y'].values)
print('*' * 50)
print('Fire!')
print('middle and original ')
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.23
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.23
get_score(pred, test_y['y'].values)
print('final and original ')
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.23
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.23
get_score(pred, test_y['y'].values)
print('final and middle and original ')
pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55 + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
Random Seed is: 1
5 fold no feature engineering
TP: 315 / 442 all 833 accuracy: 0.707350272232305 precision: 0.37815126050420167 recall: 0.7126696832579186 F_score: 0.4941176470588235 0.4941176470588235
TP: 286 / 442 all 721 accuracy: 0.7318511796733213 precision: 0.39667128987517336 recall: 0.6470588235294118 F_score: 0.4918314703353397 0.4918314703353397
TP: 290 / 442 all 728 accuracy: 0.7323049001814882 precision: 0.3983516483516483 recall: 0.6561085972850679 F_score: 0.4957264957264957 0.4957264957264957
TP: 290 / 442 all 722 accuracy: 0.73502722323049 precision: 0.40166204986149584 recall: 0.6561085972850679 F_score: 0.49828178694158065 0.49828178694158065
TP: 291 / 442 all 725 accuracy: 0.734573502722323 precision: 0.4013793103448276 recall: 0.6583710407239819 F_score: 0.4987146529562982 0.4987146529562982
TP: 292 / 442 all 731 accuracy: 0.7327586206896551 precision: 0.399452804377565 recall: 0.6606334841628959 F_score: 0.4978687127024723 0.4978687127024723
**************************************************
5 fold feature engineering middle
TP: 316 / 442 all 811 accuracy: 0.7182395644283122 precision: 0.38964241676942046 recall: 0.7149321266968326 F_score: 0.5043894652833201 0.5043894652833201
TP: 282 / 442 all 708 accuracy: 0.7341197822141561 precision: 0.3983050847457627 recall: 0.6380090497737556 F_score: 0.4904347826086956 0.4904347826086956
TP: 279 / 442 all 688 accuracy: 0.7404718693284936 precision: 0.4055232558139535 recall: 0.6312217194570136 F_score: 0.49380530973451336 0.49380530973451336
TP: 286 / 442 all 703 accuracy: 0.7400181488203267 precision: 0.406827880512091 recall: 0.6470588235294118 F_score: 0.4995633187772926 0.4995633187772926
TP: 281 / 442 all 701 accuracy: 0.7363883847549909 precision: 0.4008559201141227 recall: 0.6357466063348416 F_score: 0.4916885389326334 0.4916885389326334
TP: 285 / 442 all 710 accuracy: 0.735934664246824 precision: 0.4014084507042254 recall: 0.6447963800904978 F_score: 0.4947916666666667 0.4947916666666667
**************************************************
5 fold feature engineering final
TP: 322 / 442 all 846 accuracy: 0.7078039927404719 precision: 0.3806146572104019 recall: 0.7285067873303167 F_score: 0.5 0.5
TP: 287 / 442 all 726 accuracy: 0.7304900181488203 precision: 0.3953168044077135 recall: 0.6493212669683258 F_score: 0.4914383561643836 0.4914383561643836
TP: 287 / 442 all 711 accuracy: 0.7372958257713249 precision: 0.40365682137834036 recall: 0.6493212669683258 F_score: 0.4978317432784042 0.4978317432784042
TP: 290 / 442 all 725 accuracy: 0.7336660617059891 precision: 0.4 recall: 0.6561085972850679 F_score: 0.49700085689802914 0.49700085689802914
TP: 291 / 442 all 726 accuracy: 0.7341197822141561 precision: 0.40082644628099173 recall: 0.6583710407239819 F_score: 0.49828767123287665 0.49828767123287665
TP: 294 / 442 all 738 accuracy: 0.7313974591651543 precision: 0.3983739837398374 recall: 0.665158371040724 F_score: 0.49830508474576274 0.49830508474576274
**************************************************
Fire!
middle and original
TP: 288 / 442 all 711 accuracy: 0.7382032667876588 precision: 0.4050632911392405 recall: 0.6515837104072398 F_score: 0.4995663486556808 0.4995663486556808
TP: 290 / 442 all 713 accuracy: 0.7391107078039928 precision: 0.4067321178120617 recall: 0.6561085972850679 F_score: 0.5021645021645021 0.5021645021645021
final and original
TP: 291 / 442 all 723 accuracy: 0.735480943738657 precision: 0.4024896265560166 recall: 0.6583710407239819 F_score: 0.4995708154506438 0.4995708154506438
TP: 291 / 442 all 723 accuracy: 0.735480943738657 precision: 0.4024896265560166 recall: 0.6583710407239819 F_score: 0.4995708154506438 0.4995708154506438
final and middle and original
TP: 290 / 442 all 727 accuracy: 0.7327586206896551 precision: 0.3988995873452545 recall: 0.6561085972850679 F_score: 0.49615055603079555 0.49615055603079555
TP: 291 / 442 all 727 accuracy: 0.7336660617059891 precision: 0.40027510316368636 recall: 0.6583710407239819 F_score: 0.49786142001710865 0.49786142001710865
TP: 293 / 442 all 734 accuracy: 0.7323049001814882 precision: 0.3991825613079019 recall: 0.6628959276018099 F_score: 0.49829931972789115 0.49829931972789115
TP: 294 / 442 all 737 accuracy: 0.7318511796733213 precision: 0.3989145183175034 recall: 0.665158371040724 F_score: 0.49872773536895676 0.49872773536895676
Random Seed is: 10
5 fold no feature engineering
TP: 323 / 438 all 859 accuracy: 0.7046279491833031 precision: 0.3760186263096624 recall: 0.7374429223744292 F_score: 0.49807247494217427 0.49807247494217427
TP: 296 / 438 all 735 accuracy: 0.7363883847549909 precision: 0.40272108843537413 recall: 0.6757990867579908 F_score: 0.5046888320545609 0.5046888320545609
TP: 308 / 438 all 782 accuracy: 0.7259528130671506 precision: 0.3938618925831202 recall: 0.7031963470319634 F_score: 0.5049180327868853 0.5049180327868853
TP: 304 / 438 all 763 accuracy: 0.7309437386569873 precision: 0.3984272608125819 recall: 0.6940639269406392 F_score: 0.5062447960033305 0.5062447960033305
TP: 303 / 438 all 759 accuracy: 0.7318511796733213 precision: 0.39920948616600793 recall: 0.6917808219178082 F_score: 0.506265664160401 0.506265664160401
TP: 305 / 438 all 769 accuracy: 0.7291288566243194 precision: 0.3966189856957087 recall: 0.6963470319634704 F_score: 0.5053852526926264 0.5053852526926264
**************************************************
5 fold feature engineering middle
TP: 344 / 438 all 916 accuracy: 0.6978221415607986 precision: 0.37554585152838427 recall: 0.7853881278538812 F_score: 0.5081240768094535 0.5081240768094535
TP: 310 / 438 all 783 accuracy: 0.7273139745916516 precision: 0.3959131545338442 recall: 0.7077625570776256 F_score: 0.5077805077805078 0.5077805077805078
TP: 325 / 438 all 837 accuracy: 0.7164246823956443 precision: 0.38829151732377537 recall: 0.7420091324200914 F_score: 0.5098039215686274 0.5098039215686274
TP: 320 / 438 all 815 accuracy: 0.7218693284936479 precision: 0.39263803680981596 recall: 0.730593607305936 F_score: 0.5107741420590584 0.5107741420590584
TP: 321 / 438 all 814 accuracy: 0.7232304900181489 precision: 0.39434889434889436 recall: 0.7328767123287672 F_score: 0.512779552715655 0.512779552715655
TP: 321 / 438 all 819 accuracy: 0.7209618874773139 precision: 0.39194139194139194 recall: 0.7328767123287672 F_score: 0.5107398568019094 0.5107398568019094
**************************************************
5 fold feature engineering final
TP: 337 / 438 all 880 accuracy: 0.7078039927404719 precision: 0.38295454545454544 recall: 0.769406392694064 F_score: 0.511380880121396 0.511380880121396
TP: 309 / 438 all 772 accuracy: 0.7313974591651543 precision: 0.40025906735751293 recall: 0.7054794520547946 F_score: 0.5107438016528926 0.5107438016528926
TP: 328 / 438 all 811 accuracy: 0.7309437386569873 precision: 0.40443896424167697 recall: 0.7488584474885844 F_score: 0.5252201761409128 0.5252201761409128
TP: 326 / 438 all 801 accuracy: 0.7336660617059891 precision: 0.4069912609238452 recall: 0.7442922374429224 F_score: 0.526230831315577 0.526230831315577
TP: 324 / 438 all 797 accuracy: 0.7336660617059891 precision: 0.4065244667503137 recall: 0.7397260273972602 F_score: 0.5246963562753036 0.5246963562753036
TP: 327 / 438 all 806 accuracy: 0.7323049001814882 precision: 0.4057071960297767 recall: 0.7465753424657534 F_score: 0.5257234726688104 0.5257234726688104
**************************************************
Fire!
middle and original
TP: 313 / 438 all 778 accuracy: 0.7323049001814882 precision: 0.4023136246786632 recall: 0.7146118721461188 F_score: 0.5148026315789473 0.5148026315789473
TP: 311 / 438 all 775 accuracy: 0.7318511796733213 precision: 0.4012903225806452 recall: 0.7100456621004566 F_score: 0.5127782357790601 0.5127782357790601
final and original
TP: 317 / 438 all 780 accuracy: 0.73502722323049 precision: 0.4064102564102564 recall: 0.723744292237443 F_score: 0.5205254515599343 0.5205254515599343
TP: 315 / 438 all 778 accuracy: 0.7341197822141561 precision: 0.40488431876606684 recall: 0.7191780821917808 F_score: 0.5180921052631579 0.5180921052631579
final and middle and original
TP: 320 / 438 all 791 accuracy: 0.7327586206896551 precision: 0.404551201011378 recall: 0.730593607305936 F_score: 0.5207485760781123 0.5207485760781123
TP: 319 / 438 all 789 accuracy: 0.7327586206896551 precision: 0.40430925221799746 recall: 0.728310502283105 F_score: 0.5199674001629991 0.5199674001629991
TP: 326 / 438 all 799 accuracy: 0.734573502722323 precision: 0.40801001251564456 recall: 0.7442922374429224 F_score: 0.5270816491511722 0.5270816491511722
TP: 326 / 438 all 807 accuracy: 0.7309437386569873 precision: 0.40396530359355637 recall: 0.7442922374429224 F_score: 0.5236947791164658 0.5236947791164658
Random Seed is: 100
5 fold no feature engineering
TP: 312 / 430 all 851 accuracy: 0.7019056261343013 precision: 0.36662749706227965 recall: 0.7255813953488373 F_score: 0.48711943793911006 0.48711943793911006
TP: 292 / 430 all 742 accuracy: 0.7332123411978222 precision: 0.3935309973045822 recall: 0.6790697674418604 F_score: 0.4982935153583618 0.4982935153583618
TP: 300 / 430 all 774 accuracy: 0.7259528130671506 precision: 0.3875968992248062 recall: 0.6976744186046512 F_score: 0.4983388704318937 0.4983388704318937
TP: 297 / 430 all 755 accuracy: 0.7318511796733213 precision: 0.3933774834437086 recall: 0.6906976744186046 F_score: 0.5012658227848101 0.5012658227848101
TP: 298 / 430 all 755 accuracy: 0.7327586206896551 precision: 0.39470198675496687 recall: 0.6930232558139535 F_score: 0.5029535864978903 0.5029535864978903
TP: 301 / 430 all 766 accuracy: 0.7304900181488203 precision: 0.39295039164490864 recall: 0.7 F_score: 0.5033444816053512 0.5033444816053512
**************************************************
5 fold feature engineering middle
TP: 331 / 430 all 893 accuracy: 0.7000907441016334 precision: 0.3706606942889138 recall: 0.7697674418604651 F_score: 0.5003779289493575 0.5003779289493575
TP: 306 / 430 all 786 accuracy: 0.7259528130671506 precision: 0.3893129770992366 recall: 0.7116279069767442 F_score: 0.5032894736842105 0.5032894736842105
TP: 302 / 430 all 777 accuracy: 0.7264065335753176 precision: 0.3886743886743887 recall: 0.7023255813953488 F_score: 0.5004142502071252 0.5004142502071252
TP: 305 / 430 all 781 accuracy: 0.7273139745916516 precision: 0.3905249679897567 recall: 0.7093023255813954 F_score: 0.5037159372419487 0.5037159372419487
TP: 304 / 430 all 781 accuracy: 0.7264065335753176 precision: 0.3892445582586428 recall: 0.7069767441860465 F_score: 0.5020644095788604 0.5020644095788604
TP: 304 / 430 all 785 accuracy: 0.7245916515426497 precision: 0.3872611464968153 recall: 0.7069767441860465 F_score: 0.5004115226337449 0.5004115226337449
**************************************************
5 fold feature engineering final
TP: 326 / 430 all 876 accuracy: 0.7032667876588021 precision: 0.3721461187214612 recall: 0.7581395348837209 F_score: 0.49923430321592643 0.49923430321592643
TP: 296 / 430 all 758 accuracy: 0.7295825771324864 precision: 0.39050131926121373 recall: 0.6883720930232559 F_score: 0.4983164983164983 0.4983164983164983
TP: 295 / 430 all 754 accuracy: 0.7304900181488203 precision: 0.3912466843501326 recall: 0.686046511627907 F_score: 0.4983108108108108 0.4983108108108108
TP: 298 / 430 all 764 accuracy: 0.7286751361161524 precision: 0.3900523560209424 recall: 0.6930232558139535 F_score: 0.4991624790619766 0.4991624790619766
TP: 299 / 430 all 766 accuracy: 0.7286751361161524 precision: 0.39033942558746737 recall: 0.6953488372093023 F_score: 0.5 0.5
TP: 304 / 430 all 782 accuracy: 0.7259528130671506 precision: 0.3887468030690537 recall: 0.7069767441860465 F_score: 0.5016501650165016 0.5016501650165016
**************************************************
Fire!
middle and original
TP: 298 / 430 all 772 accuracy: 0.7250453720508166 precision: 0.3860103626943005 recall: 0.6930232558139535 F_score: 0.49584026622296173 0.49584026622296173
TP: 297 / 430 all 768 accuracy: 0.7259528130671506 precision: 0.38671875 recall: 0.6906976744186046 F_score: 0.4958263772954925 0.4958263772954925
final and original
TP: 301 / 430 all 768 accuracy: 0.7295825771324864 precision: 0.3919270833333333 recall: 0.7 F_score: 0.5025041736227045 0.5025041736227045
TP: 302 / 430 all 766 accuracy: 0.7313974591651543 precision: 0.39425587467362927 recall: 0.7023255813953488 F_score: 0.5050167224080268 0.5050167224080268
final and middle and original
TP: 299 / 430 all 771 accuracy: 0.7264065335753176 precision: 0.38780804150453957 recall: 0.6953488372093023 F_score: 0.49791840133222315 0.49791840133222315
TP: 299 / 430 all 769 accuracy: 0.7273139745916516 precision: 0.38881664499349805 recall: 0.6953488372093023 F_score: 0.4987489574645537 0.4987489574645537
TP: 302 / 430 all 778 accuracy: 0.7259528130671506 precision: 0.38817480719794345 recall: 0.7023255813953488 F_score: 0.5 0.5
TP: 304 / 430 all 787 accuracy: 0.7236842105263158 precision: 0.386277001270648 recall: 0.7069767441860465 F_score: 0.49958915365653245 0.49958915365653245
Random Seed is: 1000
5 fold no feature engineering
TP: 292 / 404 all 833 accuracy: 0.7037205081669692 precision: 0.3505402160864346 recall: 0.7227722772277227 F_score: 0.4721099434114794 0.4721099434114794
TP: 281 / 404 all 745 accuracy: 0.7336660617059891 precision: 0.37718120805369126 recall: 0.6955445544554455 F_score: 0.4891209747606614 0.4891209747606614
TP: 278 / 404 all 749 accuracy: 0.7291288566243194 precision: 0.3711615487316422 recall: 0.6881188118811881 F_score: 0.4822202948829142 0.4822202948829142
TP: 280 / 404 all 749 accuracy: 0.7309437386569873 precision: 0.37383177570093457 recall: 0.693069306930693 F_score: 0.48568950563746743 0.48568950563746743
TP: 282 / 404 all 753 accuracy: 0.7309437386569873 precision: 0.3745019920318725 recall: 0.698019801980198 F_score: 0.48746758859118416 0.48746758859118416
TP: 283 / 404 all 755 accuracy: 0.7309437386569873 precision: 0.3748344370860927 recall: 0.7004950495049505 F_score: 0.4883520276100086 0.4883520276100086
**************************************************
5 fold feature engineering middle
TP: 306 / 404 all 870 accuracy: 0.6996370235934665 precision: 0.35172413793103446 recall: 0.7574257425742574 F_score: 0.4803767660910518 0.4803767660910518
TP: 275 / 404 all 770 accuracy: 0.7168784029038112 precision: 0.35714285714285715 recall: 0.6806930693069307 F_score: 0.46848381601362865 0.46848381601362865
TP: 276 / 404 all 759 accuracy: 0.7227767695099818 precision: 0.36363636363636365 recall: 0.6831683168316832 F_score: 0.47463456577815993 0.47463456577815993
TP: 287 / 404 all 785 accuracy: 0.7209618874773139 precision: 0.36560509554140125 recall: 0.7103960396039604 F_score: 0.48275862068965514 0.48275862068965514
TP: 285 / 404 all 782 accuracy: 0.720508166969147 precision: 0.36445012787723785 recall: 0.7054455445544554 F_score: 0.4806070826306914 0.4806070826306914
TP: 289 / 404 all 790 accuracy: 0.720508166969147 precision: 0.3658227848101266 recall: 0.7153465346534653 F_score: 0.4840871021775544 0.4840871021775544
**************************************************
5 fold feature engineering final
model_sample_strong_feature_middle = model_sample_strong_feature_middle.fillna(-999)
model_sample_strong_feature_final = model_sample_strong_feature_final.fillna(-999)
model_sample_ = model_sample.fillna(-999)
for rnd in [1,10,100,1000]:
print('Random Seed is: ',rnd)
train_X,test_X, train_y, test_y = train_test_split(model_sample_strong_feature_final,label,test_size=0.2,random_state=rnd)
train_X_orig = model_sample_.loc[train_X.index]
test_X_orig = model_sample_.loc[test_X.index]
train_X_middle = model_sample_strong_feature_middle.loc[train_X.index]
test_X_middle = model_sample_strong_feature_middle.loc[test_X.index]
print('5 fold no feature engineering')
pred_out_lgb, pred_out_gbdt, pred_out_rf = N_Fold_Predict(train_X_orig, train_y['y'].values, test_X_orig, cv_ = 3)
pred =pred_out_rf >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.55 + 0.45 * pred_out_gbdt>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb * 0.5 + 0.5 * pred_out_gbdt) * 0.9 + 0.1 * pred_out_rf>= 0.215
get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering middle')
pred_out_lgb_middle, pred_out_gbdt_middle, pred_out_rf_middle = N_Fold_Predict(train_X_middle,train_y['y'].values, test_X_middle, cv_ = 3)
pred = pred_out_rf_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.55 + 0.45 * pred_out_gbdt_middle>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.1 * pred_out_rf_middle >= 0.215
get_score(pred, test_y['y'].values)
# pred =(pred_out_lgb_middle * 0.5 + 0.5 * pred_out_gbdt_middle) * 0.9 + 0.05 * (pred_out_rf_middle + pred_out_rf)>= 0.215
# get_score(pred, test_y['y'].values)
print('*' * 50)
print('5 fold feature engineering final')
pred_out_lgb_final, pred_out_gbdt_final, pred_out_rf_final = N_Fold_Predict(train_X,train_y['y'].values, test_X, cv_ = 5)
pred =pred_out_rf_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_gbdt_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final >= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.55 + 0.45 * pred_out_gbdt_final>= 0.215
get_score(pred, test_y['y'].values)
pred =pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final>= 0.215
get_score(pred, test_y['y'].values)
pred =(pred_out_lgb_final * 0.5 + 0.5 * pred_out_gbdt_final) * 0.9 + 0.1 * pred_out_rf_final>= 0.215
get_score(pred, test_y['y'].values)
print('*' * 50)
print('Fire!')
print('middle and original ')
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_middle * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_middle * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_middle *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
get_score(pred, test_y['y'].values)
print('final and original ')
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb * 0.5 )*0.55 + 0.45 * (pred_out_gbdt_final * 0.5 + 0.5 * pred_out_gbdt))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 0.5 + pred_out_lgb* 0.5 )*0.5 + 0.5 * (pred_out_gbdt_final *0.5 + 0.5 * pred_out_gbdt ))>= 0.215
get_score(pred, test_y['y'].values)
print('final and middle and original ')
pred =((pred_out_lgb_final * 0.3 + pred_out_lgb * 0.3 + 0.4 * pred_out_lgb_middle)*0.55 + 0.45 * (pred_out_gbdt_final * 0.3 + 0.3 * pred_out_gbdt + 0.4 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred =((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.215
get_score(pred, test_y['y'].values)
pred = 0.1 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.9 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
pred = 0.15 * (pred_out_rf_middle + pred_out_rf + pred_out_rf_final ) /3.0 + 0.85 * ((pred_out_lgb_final * 1.0 /3 + pred_out_lgb* 1.0 /3 + pred_out_lgb_middle* 1.0 /3)*0.5 + 0.5 * (pred_out_gbdt_final * 1.0 /3 + 1.0 /3 * pred_out_gbdt + 1.0 /3 * pred_out_gbdt_middle))>= 0.23
get_score(pred, test_y['y'].values)
Random Seed is: 1
5 fold no feature engineering
TP: 340 / 442 all 923 accuracy: 0.6892014519056261 precision: 0.36836403033586135 recall: 0.7692307692307693 F_score: 0.4981684981684982 0.4981684981684982
TP: 312 / 442 all 801 accuracy: 0.719147005444646 precision: 0.3895131086142322 recall: 0.7058823529411765 F_score: 0.50201126307321 0.50201126307321
TP: 310 / 442 all 796 accuracy: 0.7196007259528131 precision: 0.38944723618090454 recall: 0.7013574660633484 F_score: 0.5008077544426495 0.5008077544426495
TP: 315 / 442 all 806 accuracy: 0.7196007259528131 precision: 0.39081885856079407 recall: 0.7126696832579186 F_score: 0.5048076923076924 0.5048076923076924
TP: 314 / 442 all 804 accuracy: 0.7196007259528131 precision: 0.39054726368159204 recall: 0.7104072398190046 F_score: 0.5040128410914928 0.5040128410914928
TP: 314 / 442 all 811 accuracy: 0.7164246823956443 precision: 0.3871763255240444 recall: 0.7104072398190046 F_score: 0.5011971268954509 0.5011971268954509
**************************************************
5 fold feature engineering middle
TP: 343 / 442 all 914 accuracy: 0.6960072595281307 precision: 0.37527352297593 recall: 0.7760180995475113 F_score: 0.5058997050147492 0.5058997050147492
TP: 318 / 442 all 820 accuracy: 0.7159709618874773 precision: 0.3878048780487805 recall: 0.7194570135746606 F_score: 0.5039619651347068 0.5039619651347068
TP: 307 / 442 all 779 accuracy: 0.7245916515426497 precision: 0.3940949935815148 recall: 0.6945701357466063 F_score: 0.5028665028665029 0.5028665028665029
TP: 311 / 442 all 800 accuracy: 0.7186932849364791 precision: 0.38875 recall: 0.7036199095022625 F_score: 0.500805152979066 0.500805152979066
TP: 312 / 442 all 803 accuracy: 0.7182395644283122 precision: 0.38854296388542964 recall: 0.7058823529411765 F_score: 0.5012048192771086 0.5012048192771086
TP: 316 / 442 all 814 accuracy: 0.7168784029038112 precision: 0.3882063882063882 recall: 0.7149321266968326 F_score: 0.5031847133757962 0.5031847133757962
**************************************************
5 fold feature engineering final
TP: 341 / 442 all 905 accuracy: 0.6982758620689655 precision: 0.37679558011049724 recall: 0.7714932126696833 F_score: 0.5063103192279139 0.5063103192279139
TP: 308 / 442 all 779 accuracy: 0.7254990925589837 precision: 0.39537869062901154 recall: 0.6968325791855203 F_score: 0.5045045045045045 0.5045045045045045
TP: 310 / 442 all 789 accuracy: 0.7227767695099818 precision: 0.3929024081115336 recall: 0.7013574660633484 F_score: 0.503655564581641 0.503655564581641
TP: 311 / 442 all 791 accuracy: 0.7227767695099818 precision: 0.393173198482933 recall: 0.7036199095022625 F_score: 0.5044606650446067 0.5044606650446067
TP: 311 / 442 all 791 accuracy: 0.7227767695099818 precision: 0.393173198482933 recall: 0.7036199095022625 F_score: 0.5044606650446067 0.5044606650446067
TP: 310 / 442 all 794 accuracy: 0.720508166969147 precision: 0.3904282115869018 recall: 0.7013574660633484 F_score: 0.5016181229773463 0.5016181229773463
**************************************************
Fire!
middle and original
TP: 315 / 442 all 800 accuracy: 0.7223230490018149 precision: 0.39375 recall: 0.7126696832579186 F_score: 0.5072463768115942 0.5072463768115942
TP: 314 / 442 all 800 accuracy: 0.721415607985481 precision: 0.3925 recall: 0.7104072398190046 F_score: 0.5056360708534622 0.5056360708534622
final and original
TP: 313 / 442 all 797 accuracy: 0.7218693284936479 precision: 0.39272271016311167 recall: 0.7081447963800905 F_score: 0.5052461662631155 0.5052461662631155
TP: 312 / 442 all 797 accuracy: 0.7209618874773139 precision: 0.39146800501882056 recall: 0.7058823529411765 F_score: 0.5036319612590798 0.5036319612590798
final and middle and original
TP: 315 / 442 all 807 accuracy: 0.719147005444646 precision: 0.3903345724907063 recall: 0.7126696832579186 F_score: 0.5044035228182546 0.5044035228182546
TP: 314 / 442 all 804 accuracy: 0.7196007259528131 precision: 0.39054726368159204 recall: 0.7104072398190046 F_score: 0.5040128410914928 0.5040128410914928
TP: 318 / 442 all 811 accuracy: 0.72005444646098 precision: 0.3921085080147966 recall: 0.7194570135746606 F_score: 0.5075818036711892 0.5075818036711892
TP: 317 / 442 all 815 accuracy: 0.7173321234119783 precision: 0.3889570552147239 recall: 0.7171945701357466 F_score: 0.5043754972155926 0.5043754972155926
Random Seed is: 10
5 fold no feature engineering
TP: 332 / 438 all 914 accuracy: 0.6878402903811253 precision: 0.36323851203501095 recall: 0.7579908675799086 F_score: 0.4911242603550296 0.4911242603550296
TP: 319 / 438 all 816 accuracy: 0.720508166969147 precision: 0.3909313725490196 recall: 0.728310502283105 F_score: 0.5087719298245613 0.5087719298245613
TP: 328 / 438 all 849 accuracy: 0.7137023593466425 precision: 0.38633686690223795 recall: 0.7488584474885844 F_score: 0.5097125097125097 0.5097125097125097
TP: 326 / 438 all 837 accuracy: 0.7173321234119783 precision: 0.38948626045400236 recall: 0.7442922374429224 F_score: 0.5113725490196078 0.5113725490196078
TP: 326 / 438 all 838 accuracy: 0.7168784029038112 precision: 0.38902147971360385 recall: 0.7442922374429224 F_score: 0.5109717868338558 0.5109717868338558
TP: 326 / 438 all 846 accuracy: 0.7132486388384754 precision: 0.38534278959810875 recall: 0.7442922374429224 F_score: 0.5077881619937694 0.5077881619937694
**************************************************
5 fold feature engineering middle
TP: 347 / 438 all 931 accuracy: 0.6937386569872959 precision: 0.3727175080558539 recall: 0.7922374429223744 F_score: 0.5069393718042365 0.5069393718042365
TP: 324 / 438 all 817 accuracy: 0.7245916515426497 precision: 0.39657282741738065 recall: 0.7397260273972602 F_score: 0.5163346613545816 0.5163346613545816
TP: 337 / 438 all 857 accuracy: 0.7182395644283122 precision: 0.39323220536756126 recall: 0.769406392694064 F_score: 0.5204633204633206 0.5204633204633206
TP: 331 / 438 all 841 accuracy: 0.72005444646098 precision: 0.3935790725326992 recall: 0.7557077625570776 F_score: 0.5175918686473807 0.5175918686473807
TP: 332 / 438 all 841 accuracy: 0.7209618874773139 precision: 0.3947681331747919 recall: 0.7579908675799086 F_score: 0.5191555903049258 0.5191555903049258
TP: 338 / 438 all 852 accuracy: 0.721415607985481 precision: 0.3967136150234742 recall: 0.771689497716895 F_score: 0.524031007751938 0.524031007751938
**************************************************
5 fold feature engineering final
TP: 348 / 438 all 936 accuracy: 0.6923774954627949 precision: 0.3717948717948718 recall: 0.7945205479452054 F_score: 0.5065502183406113 0.5065502183406113
TP: 331 / 438 all 818 accuracy: 0.7304900181488203 precision: 0.40464547677261614 recall: 0.7557077625570776 F_score: 0.5270700636942675 0.5270700636942675
TP: 333 / 438 all 833 accuracy: 0.7254990925589837 precision: 0.3997599039615846 recall: 0.7602739726027398 F_score: 0.5239968528717546 0.5239968528717546
TP: 334 / 438 all 826 accuracy: 0.7295825771324864 precision: 0.4043583535108959 recall: 0.7625570776255708 F_score: 0.5284810126582279 0.5284810126582279
TP: 335 / 438 all 824 accuracy: 0.7313974591651543 precision: 0.4065533980582524 recall: 0.7648401826484018 F_score: 0.5309033280507132 0.5309033280507132
TP: 337 / 438 all 838 accuracy: 0.7268602540834845 precision: 0.4021479713603819 recall: 0.769406392694064 F_score: 0.5282131661442007 0.5282131661442007
**************************************************
Fire!
middle and original
TP: 330 / 438 all 830 accuracy: 0.7241379310344828 precision: 0.39759036144578314 recall: 0.7534246575342466 F_score: 0.5205047318611987 0.5205047318611987
TP: 329 / 438 all 827 accuracy: 0.7245916515426497 precision: 0.3978234582829504 recall: 0.7511415525114156 F_score: 0.5201581027667985 0.5201581027667985
final and original
TP: 333 / 438 all 834 accuracy: 0.7250453720508166 precision: 0.39928057553956836 recall: 0.7602739726027398 F_score: 0.5235849056603774 0.5235849056603774
TP: 333 / 438 all 832 accuracy: 0.7259528130671506 precision: 0.40024038461538464 recall: 0.7602739726027398 F_score: 0.5244094488188976 0.5244094488188976
final and middle and original
TP: 333 / 438 all 827 accuracy: 0.7282214156079855 precision: 0.4026602176541717 recall: 0.7602739726027398 F_score: 0.5264822134387351 0.5264822134387351
TP: 335 / 438 all 825 accuracy: 0.7309437386569873 precision: 0.40606060606060607 recall: 0.7648401826484018 F_score: 0.5304829770387964 0.5304829770387964
TP: 336 / 438 all 831 accuracy: 0.7291288566243194 precision: 0.4043321299638989 recall: 0.7671232876712328 F_score: 0.5295508274231678 0.5295508274231678
TP: 337 / 438 all 841 accuracy: 0.7254990925589837 precision: 0.40071343638525564 recall: 0.769406392694064 F_score: 0.5269741985926505 0.5269741985926505
Random Seed is: 100
5 fold no feature engineering
TP: 327 / 430 all 905 accuracy: 0.691016333938294 precision: 0.36132596685082874 recall: 0.7604651162790698 F_score: 0.4898876404494382 0.4898876404494382
TP: 307 / 430 all 810 accuracy: 0.7159709618874773 precision: 0.3790123456790123 recall: 0.713953488372093 F_score: 0.4951612903225806 0.4951612903225806
TP: 308 / 430 all 801 accuracy: 0.7209618874773139 precision: 0.38451935081148564 recall: 0.7162790697674418 F_score: 0.5004061738424046 0.5004061738424046
TP: 309 / 430 all 807 accuracy: 0.719147005444646 precision: 0.3828996282527881 recall: 0.7186046511627907 F_score: 0.4995957962813258 0.4995957962813258
TP: 309 / 430 all 807 accuracy: 0.719147005444646 precision: 0.3828996282527881 recall: 0.7186046511627907 F_score: 0.4995957962813258 0.4995957962813258
TP: 311 / 430 all 815 accuracy: 0.7173321234119783 precision: 0.3815950920245399 recall: 0.7232558139534884 F_score: 0.4995983935742972 0.4995983935742972
**************************************************
5 fold feature engineering middle
TP: 333 / 430 all 923 accuracy: 0.6882940108892922 precision: 0.3607800650054171 recall: 0.7744186046511627 F_score: 0.49223946784922396 0.49223946784922396
TP: 308 / 430 all 825 accuracy: 0.7100725952813067 precision: 0.37333333333333335 recall: 0.7162790697674418 F_score: 0.4908366533864541 0.4908366533864541
TP: 297 / 430 all 772 accuracy: 0.7241379310344828 precision: 0.38471502590673573 recall: 0.6906976744186046 F_score: 0.4941763727121463 0.4941763727121463
TP: 307 / 430 all 793 accuracy: 0.7236842105263158 precision: 0.3871374527112232 recall: 0.713953488372093 F_score: 0.5020441537203597 0.5020441537203597
TP: 307 / 430 all 794 accuracy: 0.7232304900181489 precision: 0.3866498740554156 recall: 0.713953488372093 F_score: 0.5016339869281046 0.5016339869281046
TP: 308 / 430 all 806 accuracy: 0.7186932849364791 precision: 0.38213399503722084 recall: 0.7162790697674418 F_score: 0.49838187702265374 0.49838187702265374
**************************************************
5 fold feature engineering final
TP: 335 / 430 all 930 accuracy: 0.6869328493647913 precision: 0.3602150537634409 recall: 0.7790697674418605 F_score: 0.4926470588235293 0.4926470588235293
TP: 315 / 430 all 842 accuracy: 0.7087114337568058 precision: 0.37410926365795727 recall: 0.7325581395348837 F_score: 0.4952830188679246 0.4952830188679246
TP: 315 / 430 all 842 accuracy: 0.7087114337568058 precision: 0.37410926365795727 recall: 0.7325581395348837 F_score: 0.4952830188679246 0.4952830188679246
TP: 315 / 430 all 845 accuracy: 0.707350272232305 precision: 0.3727810650887574 recall: 0.7325581395348837 F_score: 0.4941176470588236 0.4941176470588236
TP: 315 / 430 all 845 accuracy: 0.707350272232305 precision: 0.3727810650887574 recall: 0.7325581395348837 F_score: 0.4941176470588236 0.4941176470588236
TP: 318 / 430 all 856 accuracy: 0.70508166969147 precision: 0.37149532710280375 recall: 0.7395348837209302 F_score: 0.4945567651632971 0.4945567651632971
**************************************************
Fire!
middle and original
TP: 305 / 430 all 797 accuracy: 0.72005444646098 precision: 0.38268506900878296 recall: 0.7093023255813954 F_score: 0.49714751426242876 0.49714751426242876
TP: 307 / 430 all 801 accuracy: 0.72005444646098 precision: 0.383270911360799 recall: 0.713953488372093 F_score: 0.4987814784727863 0.4987814784727863
final and original
TP: 311 / 430 all 825 accuracy: 0.7127949183303085 precision: 0.37696969696969695 recall: 0.7232558139534884 F_score: 0.4956175298804781 0.4956175298804781
TP: 312 / 430 all 825 accuracy: 0.7137023593466425 precision: 0.3781818181818182 recall: 0.7255813953488373 F_score: 0.49721115537848604 0.49721115537848604
final and middle and original
TP: 307 / 430 all 815 accuracy: 0.7137023593466425 precision: 0.37668711656441717 recall: 0.713953488372093 F_score: 0.4931726907630522 0.4931726907630522
TP: 307 / 430 all 814 accuracy: 0.7141560798548094 precision: 0.37714987714987713 recall: 0.713953488372093 F_score: 0.4935691318327974 0.4935691318327974
TP: 311 / 430 all 824 accuracy: 0.7132486388384754 precision: 0.3774271844660194 recall: 0.7232558139534884 F_score: 0.49601275917065385 0.49601275917065385
TP: 313 / 430 all 833 accuracy: 0.7109800362976406 precision: 0.375750300120048 recall: 0.727906976744186 F_score: 0.49564528899445764 0.49564528899445764
Random Seed is: 1000
5 fold no feature engineering
TP: 304 / 404 all 901 accuracy: 0.6837568058076225 precision: 0.3374028856825749 recall: 0.7524752475247525 F_score: 0.4659003831417624 0.4659003831417624
TP: 294 / 404 all 801 accuracy: 0.72005444646098 precision: 0.36704119850187267 recall: 0.7277227722772277 F_score: 0.4879668049792531 0.4879668049792531
TP: 290 / 404 all 769 accuracy: 0.7309437386569873 precision: 0.37711313394018203 recall: 0.7178217821782178 F_score: 0.49445865302642794 0.49445865302642794
TP: 290 / 404 all 786 accuracy: 0.7232304900181489 precision: 0.36895674300254455 recall: 0.7178217821782178 F_score: 0.4873949579831934 0.4873949579831934
TP: 289 / 404 all 788 accuracy: 0.721415607985481 precision: 0.366751269035533 recall: 0.7153465346534653 F_score: 0.4848993288590604 0.4848993288590604
TP: 290 / 404 all 796 accuracy: 0.7186932849364791 precision: 0.36432160804020103 recall: 0.7178217821782178 F_score: 0.48333333333333345 0.48333333333333345
**************************************************
5 fold feature engineering middle
TP: 315 / 404 all 924 accuracy: 0.6833030852994555 precision: 0.3409090909090909 recall: 0.7797029702970297 F_score: 0.4743975903614458 0.4743975903614458
TP: 298 / 404 all 831 accuracy: 0.7100725952813067 precision: 0.358604091456077 recall: 0.7376237623762376 F_score: 0.48259109311740883 0.48259109311740883
TP: 291 / 404 all 795 accuracy: 0.72005444646098 precision: 0.3660377358490566 recall: 0.7202970297029703 F_score: 0.48540450375312766 0.48540450375312766
TP: 295 / 404 all 820 accuracy: 0.7123411978221416 precision: 0.3597560975609756 recall: 0.7301980198019802 F_score: 0.4820261437908497 0.4820261437908497
TP: 297 / 404 all 822 accuracy: 0.7132486388384754 precision: 0.3613138686131387 recall: 0.7351485148514851 F_score: 0.4845024469820554 0.4845024469820554
TP: 298 / 404 all 833 accuracy: 0.7091651542649727 precision: 0.3577430972388956 recall: 0.7376237623762376 F_score: 0.48181083265966046 0.48181083265966046
**************************************************
5 fold feature engineering final
TP: 316 / 404 all 919 accuracy: 0.6864791288566243 precision: 0.3438520130576714 recall: 0.7821782178217822 F_score: 0.47770219198790626 0.47770219198790626
TP: 296 / 404 all 825 accuracy: 0.7109800362976406 precision: 0.35878787878787877 recall: 0.7326732673267327 F_score: 0.4816924328722539 0.4816924328722539
TP: 301 / 404 all 813 accuracy: 0.7209618874773139 precision: 0.37023370233702335 recall: 0.745049504950495 F_score: 0.49465899753492193 0.49465899753492193
TP: 302 / 404 all 825 accuracy: 0.7164246823956443 precision: 0.3660606060606061 recall: 0.7475247524752475 F_score: 0.49145646867371856 0.49145646867371856
TP: 300 / 404 all 824 accuracy: 0.7150635208711433 precision: 0.3640776699029126 recall: 0.7425742574257426 F_score: 0.48859934853420195 0.48859934853420195
TP: 301 / 404 all 831 accuracy: 0.7127949183303085 precision: 0.3622141997593261 recall: 0.745049504950495 F_score: 0.4874493927125506 0.4874493927125506
**************************************************
Fire!
middle and original
TP: 294 / 404 all 809 accuracy: 0.7164246823956443 precision: 0.36341161928306553 recall: 0.7277227722772277 F_score: 0.4847485572959604 0.4847485572959604
TP: 294 / 404 all 810 accuracy: 0.7159709618874773 precision: 0.362962962962963 recall: 0.7277227722772277 F_score: 0.48434925864909395 0.48434925864909395
final and original
TP: 298 / 404 all 809 accuracy: 0.72005444646098 precision: 0.3683559950556242 recall: 0.7376237623762376 F_score: 0.49134377576257215 0.49134377576257215
TP: 298 / 404 all 811 accuracy: 0.719147005444646 precision: 0.36744759556103573 recall: 0.7376237623762376 F_score: 0.4905349794238683 0.4905349794238683
final and middle and original
TP: 300 / 404 all 816 accuracy: 0.7186932849364791 precision: 0.36764705882352944 recall: 0.7425742574257426 F_score: 0.4918032786885246 0.4918032786885246
TP: 302 / 404 all 821 accuracy: 0.7182395644283122 precision: 0.36784409257003653 recall: 0.7475247524752475 F_score: 0.49306122448979584 0.49306122448979584
TP: 303 / 404 all 830 accuracy: 0.7150635208711433 precision: 0.3650602409638554 recall: 0.75 F_score: 0.4910858995137763 0.4910858995137763
TP: 304 / 404 all 837 accuracy: 0.7127949183303085 precision: 0.3632019115890084 recall: 0.7524752475247525 F_score: 0.4899274778404513 0.4899274778404513
实验小结
从上面的实验中,我们可以得到如下的结论:
- 采用集成的方式,我们的模型在所有的训练集上都可以获得稳定的提升(线上线下是一致的情况);
- 模型的结果波动有些大,大概在0.49-0.52之间波动,因为数据少,所以随机性会较大,较为合理;
- 实验证明我们多个子集以及N折交叉的结果相比于单个模型的结果不仅效果好,而且稳定很多(集成大概会提升0.01-0.02个点)
总结与展望
总结
本次比赛难点以及解决方案
因为本次赛事的数据特征都相对较好,但是因为数据的个数相对较少,所以我们的重点就落在了下面的三个问题上:
- 如何进一步挖掘更好的特征
- 如何在少量数据的问题上提升模型的性能同时提高模型的鲁棒性
- 如何优化这种不可以直接求导的目标函数
而针对上面的三个问题,我们给出了如下的解决方案:
- 我们尽可能提取有意义的特征同时将 :最后我们给出了新的5类特征:1.提高模型的表达能力的特征¶;2.比例特征;3.标准差还原特征(反映信息的波动);4.均值特征;5.趋势特征;具体的细节可以参考特征的构建部分。
- 我们采用多个模型集成的方式来提升模型的性能同时提高模型的鲁棒性;这些又由下面两个模块组成:多个不同训练集的模型融合+多个不同模型的融合
- 优化F-score一共有3种常见的方法,包括加权,转化为类别不平衡的问题;优化F-score的下界或者上届近似函数;设置阈值等,此处我们为了方便,选择直接使用设置阈值的方式进行。
方案总结
模型的优点:
- 鲁棒性好,稳定,性能相对不错;
- 模型的架构较为完善,可以获得稳步提升;
- 给出了较好的特征构建的思路;
展望
本次比赛还可以进行进一步的完善,我们将其总结如下:
- 尽可能扩展数据集(本质)
- 调整模型的参数,因为数据集相对较小的情况下,参数的影响会相对较大,所以参数的调整往往也可以带来较大的收获;
- 提取一些高质量特征,例如一些差值特征等,贷款与还款的差值等等,相信这些特征从多个角度对模型带来帮助。