由于需要 stacking 中每个基模型都需要对数据集进行划分后进行交叉训练,如果为每个模型都写这部分的代码会显得非常冗余,因此这里提供一种简便实现 stacking 的思路。
具体做法就是先实现一个父类,父类中实现了交叉训练的方法,因为这个方法对所有模型都是一致的,然后声明两个方法:train
和 predict
,由于采用的基模型不同,这两个方法的具体实现也不同,因此需要在子类中实现。下面以 python 为例进行讲解
import numpy as np
from sklearn.model_selection import KFold
class BasicModel(object):
"""Parent class of basic models"""
def train(self, x_train, y_train, x_val, y_val):
"""return a trained model and eval metric of validation data"""
pass
def predict(self, model, x_test):
"""return the predicted result of test data"""
pass
def get_oof(self, x_train, y_train, x_test, n_folds = 5):
"""K-fold stacking"""
num_train, num_test = x_train.shape[0], x_test.shape[0]
oof_train = np.zeros((num_train,))
oof_test = np.zeros((num_test,))
oof_test_all_fold = np.zeros((num_test, n_folds))
aucs = []
KF = KFold(n_splits = n_folds, random_state=2017)
for i, (train_index, val_index) in enumerate(KF.split(x_train)):
print(&#