1.综述
资料集合
模型是由于有较高的方差产生,集成多个模型可以减小方差,为了使模型有集成有效,需要每一模型都需要是很好的模型但是需要犯不同的错误,结果会更鲁棒一些
主要参考内容https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/ ,包含了很多集成的代码实现
中文资料https://www.cnblogs.com/szxspark/p/10144913.html
一些简单的keras实现的ensemble的例子https://machinelearningmastery.com/?s=ensemble&post_type=post&submit=Search
2.做法
使用不同的初始化情况的模型来获得相同配置的输出,收集全部的输出后来进行平均,一般做ensamble的数目不会特别多,有两个原因,一个是计算代价问题,另一个是过多的模型收获的收益不会一直增长,
1)训练采用不同的数据
k-fold分割数据,每一个数据集的子集都用来训练模型,获取多个模型进行集成
在数据集中重新重新采样,抽取数据,(叫bootstrap aggregation,或者bootstrap aggregation)
可以负采样(抽样后不放回),
2)训练不模型来降低模型
相同模型和数据,利用不同不同的初始化方式来训练模型并集成(可以降低方差,但是泛化误差不会好很多,都是同一个类型的映射)
不同的模型,包括不同隐层,不同的学习率,正则化方式等
一个模型再训练的不同时间段保存(中间可能加入震荡的噪声,震荡学习率(带热重启动的随机梯度下降(SGDR)), )
3)组合方式
直接平均
利用dev进行加权平均,加权的权值由在dev上的验证效果决定
要一个新的模型来获取之前的值(stacked)
一些实际的做法
1.对于加权平均的方式来对分类做emsemble,有两种方式,一种是穷举的方式来搜索,可以使用python的itertools.product(A,repeat=num of model),可以把A和B的所有组合选出来。
速度太慢,无法并行,组合情况随模型的数目和取值成指数增长
另一种是使用一些搜索算法,包括差分进化算法,pso等。
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
![](https://images.cnblogs.com/OutliningIndicators/ExpandedBlockStart.gif)
1 # global optimization to find coefficients for weighted ensemble on blobs problem 2 from sklearn.datasets.samples_generator import make_blobs 3 from sklearn.metrics import accuracy_score 4 from keras.utils import to_categorical 5 from keras.models import Sequential 6 from keras.layers import Dense 7 from numpy import mean 8 from numpy import std 9 from numpy import array 10 from numpy import argmax 11 from numpy import tensordot 12 from numpy.linalg import norm 13 from scipy.optimize import differential_evolution 14 15 16 # fit model on dataset 17 def fit_model(trainX, trainy): 18 trainy_enc = to_categorical(trainy) 19 # define model 20 model = Sequential() 21 model.add(Dense(25, input_dim=2, activation='relu')) 22 model.add(Dense(3, activation='softmax')) 23 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 24 # fit model 25 model.fit(trainX, trainy_enc, epochs=500, verbose=0) 26 return model 27 28 29 # make an ensemble prediction for multi-class classification 30 def ensemble_predictions(members, weights, testX): 31 # make predictions 32 yhats = [model.predict(testX) for model in members] 33 yhats = array(yhats) 34 # weighted sum across ensemble members 35 summed = tensordot(yhats, weights, axes=((0),(0))) 36 # argmax across classes 37 result = argmax(summed, axis=1) 38 return result 39 40 # # evaluate a specific number of members in an ensemble 41 def evaluate_ensemble(members, weights, testX, testy): 42 # make prediction 43 yhat = ensemble_predictions(members, weights, testX) 44 # calculate accuracy 45 return accuracy_score(testy, yhat) 46 47 # normalize a vector to have unit norm 48 def normalize(weights): 49 # calculate l1 vector norm 50 result = norm(weights, 1) 51 # check for a vector of all zeros 52 if result == 0.0: 53 return weights 54 # return normalized vector (unit norm) 55 return weights / result 56 57 # loss function for optimization process, designed to be minimized 58 def loss_function(weights, members, testX, testy): 59 # normalize weights 60 normalized = normalize(weights) 61 # calculate error rate 62 return 1.0 - evaluate_ensemble(members, normalized, testX, testy) 63 64 # generate 2d classification dataset 65 X, y = make_blobs(n_samples=1100, centers=3, n_features=2, cluster_std=2, random_state=2) 66 # split into train and test 67 n_train = 100 68 trainX, testX = X[:n_train, :], X[n_train:, :] 69 trainy, testy = y[:n_train], y[n_train:] 70 print(trainX.shape, testX.shape) 71 # fit all models 72 n_members = 5 73 members = [fit_model(trainX, trainy) for _ in range(n_members)] 74 # evaluate each single model on the test set 75 testy_enc = to_categorical(testy) 76 for i in range(n_members): 77 _, test_acc = members[i].evaluate(testX, testy_enc, verbose=0) 78 print('Model %d: %.3f' % (i+1, test_acc)) 79 # evaluate averaging ensemble (equal weights) 80 weights = [1.0/n_members for _ in range(n_members)] 81 score = evaluate_ensemble(members, weights, testX, testy) 82 print('Equal Weights Score: %.3f' % score) 83 # define bounds on each weight 84 bound_w = [(0.0, 1.0) for _ in range(n_members)] 85 # arguments to the loss function 86 search_arg = (members, testX, testy) 87 # global optimization of ensemble weights 88 result = differential_evolution(loss_function, bound_w, search_arg, maxiter=80, tol=1e-7) 89 # get the chosen weights 90 weights = normalize(result['x']) 91 print('Optimized Weights: %s' % weights) 92 # evaluate chosen weights 93 score = evaluate_ensemble(members, weights, testX, testy) 94 print('Optimized Weights Score: %.3f' % score)
2.对于stacking ensemble的情况,可以使用k-fold的方式分割数据集,训练多个模型,对于每一个模型,抛弃全部的训练集,保留验证集,利用全部的验证集通过训练好的模型来构建 一组新的数据,利用新数据和对应label训练模型(逻辑回归等)。
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
![](https://images.cnblogs.com/OutliningIndicators/ExpandedBlockStart.gif)
1 # stacked generalization with linear meta model on blobs dataset 2 from sklearn.datasets.samples_generator import make_blobs 3 from sklearn.metrics import accuracy_score 4 from sklearn.linear_model import LogisticRegression 5 from keras.models import load_model 6 from keras.utils import to_categorical 7 from numpy import dstack 8 9 # load models from file 10 def load_all_models(n_models): 11 all_models = list() 12 for i in range(n_models): 13 # define filename for this ensemble 14 filename = 'models/model_' + str(i + 1) + '.h5' 15 # load model from file 16 model = load_model(filename) 17 # add to list of members 18 all_models.append(model) 19 print('>loaded %s' % filename) 20 return all_models 21 22 # create stacked model input dataset as outputs from the ensemble 23 def stacked_dataset(members, inputX): 24 stackX = None 25 for model in members: 26 # make prediction 27 yhat = model.predict(inputX, verbose=0) 28 # stack predictions into [rows, members, probabilities] 29 if stackX is None: 30 stackX = yhat 31 else: 32 stackX = dstack((stackX, yhat)) 33 # flatten predictions to [rows, members x probabilities] 34 stackX = stackX.reshape((stackX.shape[0], stackX.shape[1]*stackX.shape[2])) 35 return stackX 36 37 # fit a model based on the outputs from the ensemble members 38 def fit_stacked_model(members, inputX, inputy): 39 # create dataset using ensemble 40 stackedX = stacked_dataset(members, inputX) 41 # fit standalone model 42 model = LogisticRegression() 43 model.fit(stackedX, inputy) 44 return model 45 46 # make a prediction with the stacked model 47 def stacked_prediction(members, model, inputX): 48 # create dataset using ensemble 49 stackedX = stacked_dataset(members, inputX) 50 # make a prediction 51 yhat = model.predict(stackedX) 52 return yhat 53 54 # generate 2d classification dataset 55 X, y = make_blobs(n_samples=1100, centers=3, n_features=2, cluster_std=2, random_state=2) 56 # split into train and test 57 n_train = 100 58 trainX, testX = X[:n_train, :], X[n_train:, :] 59 trainy, testy = y[:n_train], y[n_train:] 60 print(trainX.shape, testX.shape) 61 # load all models 62 n_members = 5 63 members = load_all_models(n_members) 64 print('Loaded %d models' % len(members)) 65 # evaluate standalone models on test dataset 66 for model in members: 67 testy_enc = to_categorical(testy) 68 _, acc = model.evaluate(testX, testy_enc, verbose=0) 69 print('Model Accuracy: %.3f' % acc) 70 # fit stacked model using the ensemble 71 model = fit_stacked_model(members, testX, testy) 72 # evaluate model on test set 73 yhat = stacked_prediction(members, model, testX) 74 acc = accuracy_score(testy, yhat) 75 print('Stacked Test Accuracy: %.3f' % acc)
一般小模型使用神经网络,新的ensemble部分的小模型也是用神经网络。
--------------------