【数据挖掘】金融风控 Task05 模型融合

最新推荐文章于 2024-02-02 14:38:50 发布

一一张xi

最新推荐文章于 2024-02-02 14:38:50 发布

阅读量555

点赞数

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/a8689756/article/details/108836119

版权

模型融合即通过融合不同的模型来提高机器学习的性能，这一方法在机器学习比赛中有着广泛应用，常见的模型融合方法有平均、投票、综合、stacking、blending、boosting、bagging等

5. 模型融合方法

5.1 平均法

5.1.1 简单的加权平均

简单加权平均，结果直接融合求多个预测结果的平均值。pre1-pren分别是n组模型预测出来的结果，将其进行加权融
在这里插入图片描述

pre = (pre1 + pre2 + pre3 +...+pren )/n

5.1.2 加权平均

加权平均法一般根据之前预测模型的准确率，进行加权融合，将准确性高的模型赋予更高的权重。
在这里插入图片描述

pre = 0.3pre1 + 0.3pre2 + 0.4pre3

5.2 投票法

5.2.1 简单投票

from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier

clf1=LogisticRegression(random_state=1)
'''
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)
'''
clf2=RandomForestClassifier(random_state=1)
'''
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators='warn',
                       n_jobs=None, oob_score=False, random_state=1, verbose=0,
                       warm_start=False)
'''
clf3=XGBClassifier(learning_rate=0.1,  #收缩因子
                   n_estimators=150, #基学习器个数
                   max_depth=4, #单棵树的最大深度
                   min_child_weight=2,#叶子节点继续划分的最小样本权重
                   subsample=0.7,#训练模型的样本占总样本的比例，用于防止过拟合
                   objective='binary:logistic')#学习任务的类型

'''
XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None, gamma=None,
              gpu_id=None, importance_type='gain', interaction_constraints=None,
              learning_rate=0.1, max_delta_step=None, max_depth=4,
              min_child_weight=2, missing=nan, monotone_constraints=None,
              n_estimators=150, n_jobs=None, num_parallel_tree=None,
              objective='binary:logistic', random_state=None, reg_alpha=None,
              reg_lambda=None, scale_pos_weight=None, subsample=0.7,
              tree_method=None, validate_parameters=None, verbosity=None)
'''

vclf=VotingClassifier(estimators=[('lr',clf1),('rf',clf2),('xgb',clf3)])
vclf=vclf.fit(x_train,y_train)
print(vclf.predict(x_test))
vclf = vclf .fit(x_train,y_train)
print(vclf .predict(x_test))

5.2.2 加权投票

在VotingClassifier中加入参数 voting='soft', weights=[2, 1, 1]，weights用于调节基模型的权重

from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = XGBClassifier(learning_rate=0.1, n_estimators=150, max_depth=4, min_child_weight=2, subsample=0.7,objective='binary:logistic')
 
vclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('xgb', clf3)], voting='soft', weights=[2, 1, 1])
vclf = vclf .fit(x_train,y_train)
print(vclf .predict(x_test))

5.3 stacking

5.3.1 原理

stacking 将若干基学习器获得的预测结果，将预测结果作为新的训练集来训练一个学习器。

在stacking方法中，我们把个体学习器叫做初级学习器，用于结合的学习器叫做次级学习器或元学习器（meta-learner），次级学习器用于训练的数据叫做次级训练集。次级训练集是在训练集上用初级学习器得到的。