天池数据挖掘比赛-心跳信号分类05-模型融合

模型融合

1、简单加权融合:

  • 回归(分类概率):算数平均融合、几何平均融合
  • 分类:投票
  • 综合:排序融合、log融合

2、stacking/blending:

  • 构建多层模型,并利用预测结果再拟合预测

3、boosting/bagging:

  • 多树的提升方法

一、回归\分类概率-融合:

1、简单加权平均,结果直接融合

import numpy as np
import pandas as pd
from sklearn import metrics

# 生成一些简单的样本数据,test_prei代表第i个模型的预测值
test_pre0 = [1.3,3.2,2.5,6.2]
test_pre1 = [0.7,3.1,2.2,5.9]
test_pre2 = [1.1,2.7,2.3,6.0]

# y_test_true代表模型的真实值
y_test_true = [1,3,2,6]

# 定义结果的加权平均函数
def Weighted_method(test_pre0,test_pre1,test_pre2,w=[1/3,1/3,1/3]):
    Weighted_result = w[0]*pd.Series(test_pre0)+w[1]*pd.Series(test_pre1)+w[2]*pd.Series(test_pre2)
    return Weighted_result

# 各模型的预测结果计算MAE
print('Pred0 MAE:',metrics.mean_absolute_error(y_test_true,test_pre0))
print('Pred1 MAE:',metrics.mean_absolute_error(y_test_true,test_pre1))
print('Pred2 MAE:',metrics.mean_absolute_error(y_test_true,test_pre2))

# 根据加权计算MAE
w = [0.3,0.4,0.3] # 定义比重权值
Weighted_pre = Weighted_method(test_pre0,test_pre1,test_pre2,w)
print('Weighted_pre MAE:',metrics.mean_absolute_error(y_test_true,Weighted_pre))
Pred0 MAE: 0.3000000000000001
Pred1 MAE: 0.175
Pred2 MAE: 0.17499999999999993
Weighted_pre MAE: 0.08750000000000024
  • Pandas模块的数据结构主要有两种:1.Series 2.DataFrame,其中Series 是一维数组,基于Numpy的ndarray 结构
  • metrics.mean_absolute_error:平均绝对值误差

可以看到,Weighted_pre的平均绝对值误差是要比未加权之前好得多的,称为简单加权平均

  • 下面介绍一些特殊的形式,如mean平均,median平均
# 定义结果的加权平均函数
def Mean_method(test_pre0,test_pre1,test_pre2):
    Mean_result = pd.concat([pd.Series(test_pre0),pd.Series(test_pre1),pd.Series(test_pre2)],axis=1).mean(axis=1)
    return Mean_result

Mean_pre = Mean_method(test_pre0,test_pre1,test_pre2)
print('Mean_pre MAE:',metrics.mean_absolute_error(y_test_true,Mean_pre))

# 定义结果的加权平均函数
def Median_method(test_pre0,test_pre1,test_pre2):
    Median_result = pd.concat([pd.Series(test_pre0),pd.Series(test_pre1),pd.Series(test_pre2)],axis=1).median(axis=1)
    return Median_result

Median_pre = Median_method(test_pre0,test_pre1,test_pre2)
print('Median_pre MAE:',metrics.mean_absolute_error(y_test_true,Median_pre))
Mean_pre MAE: 0.10000000000000026
Median_pre MAE: 0.125
  • pandas.concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合
  • mean平均值、median中值

2、stacking融合(回归)

from sklearn import linear_model
def Stacking_method(train_reg0,train_reg1,train_reg2,y_train_true,test_pre0,test_pre1,test_pre2,model_L2= linear_model.LinearRegression()):
    model_L2.fit(pd.concat([pd.Series(train_reg0),pd.Series(train_reg1),pd.Series(train_reg2)],axis=1).values,y_train_true)
    Stacking_result = model_L2.predict(pd.concat([pd.Series(test_pre0),pd.Series(test_pre1),pd.Series(test_pre2)],axis=1).values)
    return Stacking_result

# 生成一些简单的样本数据,test_prei 代表第i个模型的预测值
train_reg0 = [3.4, 8.0, 9.3, 5.1]
train_reg1 = [2.9, 8.2, 9.0, 4.8]
train_reg2 = [3.2, 7.9, 9.2, 5.1]
# y_test_true 代表模型的真实值
y_train_true = [3, 8, 9, 5] 

test_pre0 = [1.3, 3.2, 2.5, 6.2]
test_pre1 = [0.7, 3.1, 2.2, 5.9]
test_pre2 = [1.1, 2.7, 2.3, 6.0]

# y_test_true代表模型的真实值
y_test_true = [1,3,2,6]

model_L2= linear_model.LinearRegression()
Stacking_pre = Stacking_method(train_reg0,train_reg1,train_reg2,y_train_true,test_pre0,test_pre1,test_pre2,model_L2)
print('Stacking_pre MAE:',metrics.mean_absolute_error(y_test_true, Stacking_pre))
Stacking_pre MAE: 0.2010357815442569
  • sklearn.linear_model.LinearRegression 类是一个估计器(estimator)。 估计器依据观测值来预测结果

可以看到模型的结果相较于之前更差了,这是因为第二层的Stacking模型不宜选取的过于复杂,这会导致模型在训练集上过拟合,从而不能达到很好的效果。

二、分类模型融合

import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_blobs
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score,roc_auc_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
import warnings
warnings.filterwarnings('ignore')

1、Voting投票机制

Voting投票机制,分为软投票和硬投票两种,其原理采用少数服从多数的思想

'''硬投票:对多个模型直接进行投票,不区分模型结果的相对重要度,最终投票数最多的类即为最终被预测的类'''
iris = datasets.load_iris()

x = iris.data
y = iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)

clf1 = lgb.LGBMClassifier(learning_rate=0.1,n_estimators=150,max_depth=3,min_child_weight=2,scolsample_bytree=0.6,objective='binary:logistic')
clf2 = RandomForestClassifier(n_estimators=200,max_depth=10,min_samples_split=10,min_samples_leaf=63,oob_score=True)
clf3 = SVC(C=0.1)

# 硬投票
eclf = VotingClassifier(estimators=[('lgb', clf1), ('rf', clf2), ('svc', clf3)], voting='hard')
for clf, label in zip([clf1, clf2, clf3, eclf], ['LGB', 'Random Forest', 'SVM', 'Ensemble']):
    scores = cross_val_score(clf, x, y, cv=5, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
Accuracy: 0.94 (+/- 0.05) [LGB]
Accuracy: 0.33 (+/- 0.00) [Random Forest]
Accuracy: 0.92 (+/- 0.03) [SVM]
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
[LightGBM] [Warning] Unknown parameter: scolsample_bytree
Accuracy: 0.93 (+/- 0.03) [Ensemble]

2、 分类的Stacking\Blending融合

  • stacking是一种分层模型集成框架:两层stacking中,第一层由多个基础学习器构成,其输入为原始数据集;第二层以第一层学习器的输出作为训练集再进行训练
'''5-Fold Stacking'''
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier,GradientBoostingClassifier
import pandas as pd
# 创建训练的数据集
data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]

# 模型融合中使用到的各个单模型
clfs = [LogisticRegression(solver='lbfgs'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]
 
# 切分一部分数据作为测试集
X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)

dataset_blend_train = np.zeros((X.shape[0], len(clfs)))
dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs)))

# 5折stacking
n_splits = 5
skf = StratifiedKFold(n_splits)
skf = skf.split(X, y)

for j, clf in enumerate(clfs):
    # 依次训练各个单模型
    dataset_blend_test_j = np.zeros((X_predict.shape[0], 5))
    for i, (train, test) in enumerate(skf):
        # 5-Fold交叉训练,使用第i个部分作为预测,剩余的部分来训练模型,获得其预测的输出作为第i部分的新特征。
        X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]
        clf.fit(X_train, y_train)
        y_submission = clf.predict_proba(X_test)[:, 1]
        dataset_blend_train[test, j] = y_submission
        dataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1]
    # 对于测试集,直接用这k个模型的预测值均值作为新的特征。
    dataset_blend_test[:, j] = dataset_blend_test_j.mean(1)
    print("val auc Score: %f" % roc_auc_score(y_predict, dataset_blend_test[:, j]))

clf = LogisticRegression(solver='lbfgs')
clf.fit(dataset_blend_train, y)
y_submission = clf.predict_proba(dataset_blend_test)[:, 1]

print("Val auc Score of Stacking: %f" % (roc_auc_score(y_predict, y_submission)))
val auc Score: 1.000000
val auc Score: 0.500000
val auc Score: 0.500000
val auc Score: 0.500000
val auc Score: 0.500000
Val auc Score of Stacking: 1.000000
  • 集成学习stacking:https://blog.csdn.net/rocling/article/details/93659276

  • Blending是一种多层模型融合的形式,其首先将原始训练集分为新的训练集和新的测试集,然后第一层使用新的训练集训练多个模型,以预测新的测试集和test数据集的label;然后第二层使用第一层训练的结果作为新的特征在新的测试集上继续训练,然后用第一层预测的test数据集的label做特征,在第二层训练的模型上进行预测

优点:比stacking简单,避开了信息泄露的问题:generlizers和stacker使用了不一样的数据集

缺点:使用数据量少,blender可能过拟合,stacking使用交叉验证更稳健

'''Blending'''
 
# 创建训练的数据集
data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]
 
# 模型融合中使用到的各个单模型
clfs = [LogisticRegression(solver='lbfgs'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        #ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]

# 切分一部分数据作为测试集
X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)

# 切分训练数据集为d1,d2两部分
X_d1, X_d2, y_d1, y_d2 = train_test_split(X, y, test_size=0.5, random_state=2020)
dataset_d1 = np.zeros((X_d2.shape[0], len(clfs)))
dataset_d2 = np.zeros((X_predict.shape[0], len(clfs)))
 
for j, clf in enumerate(clfs):
    # 依次训练各个单模型
    clf.fit(X_d1, y_d1)
    y_submission = clf.predict_proba(X_d2)[:, 1]
    dataset_d1[:, j] = y_submission
    # 对于测试集,直接用这k个模型的预测值作为新的特征。
    dataset_d2[:, j] = clf.predict_proba(X_predict)[:, 1]
    print("val auc Score: %f" % roc_auc_score(y_predict, dataset_d2[:, j]))

# 融合使用的模型
clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30)
clf.fit(dataset_d1, y_d2)
y_submission = clf.predict_proba(dataset_d2)[:, 1]
print("Val auc Score of Blending: %f" % (roc_auc_score(y_predict, y_submission)))
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
Val auc Score of Blending: 1.000000
  • 集成学习Blending:https://yishuihancheng.blog.csdn.net/article/details/112554257

三、其他方法

将特征放进模型中预测,并将预测结果变换并作为新的特征加入原有特征再经过模型预测结果

def Ensemble_add_feature(train,test,target,clfs):
    
    # n_flods = 5
    # skf = list(StratifiedKFold(y, n_folds=n_flods))

    train_ = np.zeros((train.shape[0],len(clfs*2)))
    test_ = np.zeros((test.shape[0],len(clfs*2)))

    for j,clf in enumerate(clfs):
        '''依次训练各个单模型'''
        # print(j, clf)
        '''使用第1个部分作为预测,第2部分来训练模型,获得其预测的输出作为第2部分的新特征。'''
        # X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]

        clf.fit(train,target)
        y_train = clf.predict(train)
        y_test = clf.predict(test)

        # 新特征生成
        train_[:,j*2] = y_train**2
        test_[:,j*2] = y_test**2
        train_[:, j+1] = np.exp(y_train)
        test_[:, j+1] = np.exp(y_test)
        # print("val auc Score: %f" % r2_score(y_predict, dataset_d2[:, j]))
        print('Method ',j)

    train_ = pd.DataFrame(train_)
    test_ = pd.DataFrame(test_)
    return train_,test_
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()

data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]

x_train,x_test,y_train,y_test=train_test_split(data,target,test_size=0.3)
x_train = pd.DataFrame(x_train) ; x_test = pd.DataFrame(x_test)

# 模型融合中使用到的各个单模型
clfs = [LogisticRegression(),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]

New_train,New_test = Ensemble_add_feature(x_train,x_test,y_train,clfs)

clf = LogisticRegression()
# clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30)
clf.fit(New_train, y_train)
y_emb = clf.predict_proba(New_test)[:, 1]

print("Val auc Score of stacking: %f" % (roc_auc_score(y_test, y_emb)))
Method  0
Method  1
Method  2
Method  3
Method  4
Val auc Score of stacking: 1.000000

四、本赛题相关工作

1、准备工作

导入数据集并预处理->训练集与测试集划分->构建单模型:随机森林、LGB、NN->读取并演示生成预测数据

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

import itertools
import matplotlib.gridspec as gridspec
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor
# from mlxtend.classifier import StackingClassifier
from sklearn.model_selection import cross_val_score, train_test_split
# from mlxtend.plotting import plot_learning_curves
# from mlxtend.plotting import plot_decision_regions

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from sklearn.neural_network import MLPClassifier,MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error
'''降低所需内存空间'''
def reduce_mem_usage(df):
    start_mem = df.memory_usage().sum() / 1024**2 
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2 
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df
# 读取数据
train = pd.read_csv('train.csv')
test = pd.read_csv('testA.csv')

# 简单预处理
train_list = []
for items in train.values:
    train_list.append([items[0]] + [float(i) for i in items[1].split(',')] + [items[2]])
    
test_list = []
for items in test.values:
    test_list.append([items[0]] + [float(i) for i in items[1].split(',')])

train = pd.DataFrame(np.array(train_list))
test = pd.DataFrame(np.array(test_list))

# id列不算入特征
features = ['s_'+str(i) for i in range(len(train_list[0])-2)] 
train.columns = ['id'] + features + ['label']
test.columns = ['id'] + features

train = reduce_mem_usage(train)
test = reduce_mem_usage(test)
Memory usage of dataframe is 157.93 MB
Memory usage after optimization is: 39.67 MB
Decreased by 74.9%
Memory usage of dataframe is 31.43 MB
Memory usage after optimization is: 7.90 MB
Decreased by 74.9%
# 根据7:3划分训练集和校验集
X_train = train.drop(['id','label'], axis=1)
y_train = train['label']

# 测试集
X_test = test.drop(['id'], axis=1)

# 第一次运行可以先用一个subdata,这样速度会快些
X_train = X_train.iloc[:50000,:20]
y_train = y_train.iloc[:50000]
X_test = X_test.iloc[:,:20]

# 划分训练集和测试集
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3)
# 单模函数
def build_model_rf(X_train,y_train):
    model = RandomForestRegressor(n_estimators = 100)
    model.fit(X_train, y_train)
    return model

def build_model_lgb(X_train,y_train):
    model = lgb.LGBMRegressor(num_leaves=63,learning_rate = 0.1,n_estimators = 100)
    model.fit(X_train, y_train)
    return model

def build_model_nn(X_train,y_train):
    model = MLPRegressor(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,solver='lbfgs')
    model.fit(X_train, y_train)
    return model
# 针对三个单模进行训练
# 单模没有进行调参,因此是弱分类器,效果可能不是很好。

print('predict rf...')
model_rf = build_model_rf(X_train,y_train)
val_rf = model_rf.predict(X_val)
subA_rf = model_rf.predict(X_test)

print('predict lgb...')
model_lgb = build_model_lgb(X_train,y_train)
val_lgb = model_lgb.predict(X_val)
subA_lgb = model_rf.predict(X_test)

print('predict NN...')
model_nn = build_model_nn(X_train,y_train)
val_nn = model_nn.predict(X_val)
subA_nn = model_rf.predict(X_test)
predict rf...
predict lgb...
predict NN...

3、加权融合

若无权重矩阵,就是均值融合模型
权重矩阵可进行自定义,使用三个单模进行融合,矩阵size可修改

# 加权融合模型,如果w没有变,就是均值融合
def Weighted_method(test_pre1,test_pre2,test_pre3,w=[1/3,1/3,1/3]):
    Weighted_result = w[0]*pd.Series(test_pre1)+w[1]*pd.Series(test_pre2)+w[2]*pd.Series(test_pre3)
    return Weighted_result

# 初始权重,可以进行自定义,这里我们随便设置一个权重
w = [0.2, 0.3, 0.5]

val_pre = Weighted_method(val_rf,val_lgb,val_nn,w)
MAE_Weighted = mean_absolute_error(y_val,val_pre)
print('MAE of Weighted of val:',MAE_Weighted)
MAE of Weighted of val: 0.24284514701931437

展示多个单模预测结果融合成融和模型的结果

# 预测数据部分
subA = Weighted_method(subA_rf,subA_lgb,subA_nn,w)

# 生成提交文件
sub = pd.DataFrame()
sub['SaleID'] = X_test.index
sub['price'] = subA
sub.to_csv('./sub_Weighted.csv',index=False)

3、Stacking融合

# 第一层
train_rf_pred = model_rf.predict(X_train)
train_lgb_pred = model_lgb.predict(X_train)
train_nn_pred = model_nn.predict(X_train)

stacking_X_train = pd.DataFrame()
stacking_X_train['Method_1'] = train_rf_pred
stacking_X_train['Method_2'] = train_lgb_pred
stacking_X_train['Method_3'] = train_nn_pred

stacking_X_val = pd.DataFrame()
stacking_X_val['Method_1'] = val_rf
stacking_X_val['Method_2'] = val_lgb
stacking_X_val['Method_3'] = val_nn

stacking_X_test = pd.DataFrame()
stacking_X_test['Method_1'] = subA_rf
stacking_X_test['Method_2'] = subA_lgb
stacking_X_test['Method_3'] = subA_nn
stacking_X_test.head()
Method_1Method_2Method_3
00.000.000.00
11.781.781.78
22.982.982.98
30.000.000.00
40.000.000.00
# 第二层是用random forest
model_lr_stacking = build_model_rf(stacking_X_train,y_train)

## 训练集
train_pre_Stacking = model_lr_stacking.predict(stacking_X_train)
print('MAE of stacking:',mean_absolute_error(y_train,train_pre_Stacking))

## 验证集
val_pre_Stacking = model_lr_stacking.predict(stacking_X_val)
print('MAE of stacking:',mean_absolute_error(y_val,val_pre_Stacking))

## 预测集
print('Predict stacking...')
subA_Stacking = model_lr_stacking.predict(stacking_X_test)
MAE of stacking: 0.0017582857142857137
MAE of stacking: 0.08497199999999999
Predict stacking...

五、总结

模型融合的优势:

  • **结果层面的融合:**比如根据结果的得分进行加权融合,其重要条件是模型结果的得分要比较近似但结果的差异要比较大

  • **特征层面的融合:**相互学习特征工程,比如使用同种模型训练,可以把特征进行切分给不同的模型,

  • **模型层面的融合:**涉及模型的堆叠和设计,比如加stacking、部分模型的结果作为特征输入,做模型的融合最好不同模型类型要有一定的差异

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值