如何用更多数据更新神经网络模型

weixin_44842237

已于 2023-09-01 21:15:41 修改

阅读量678

点赞数 2

文章标签：神经网络人工智能深度学习

于 2023-09-01 19:48:38 首次发布

原文链接：https://machinelearningmastery.com/update-neural-network-models-with-more-data/

版权

原文链接：

https://machinelearningmastery.com/update-neural-network-models-with-more-data/

本文为翻译内容。

用于预测建模的深度神经网络模型可能需要更新。

这可能是因为自开发和部署模型以来数据已发生更改，或者是自模型设置后提供了更多有标签数据的情况，并且这些附加数据预期能改善模型的性能。

使用新数据更新神经网络模型时，特别在模型自动更新（例如使用周期性时间表）时，通过不同方法实验和测试非常重要。

有很多方式来更新神经网络模型，尽管主要方法基本都只包含以下两种方式：要么将已有模型设置为起点重新训练，要么保持已有模型不变向模型中添加新模型用来联合预测。

在这个课程中，你将学会如何针对新的数据更新深度学习神经网络模型。

在完成课程后，你将会知道：

1.当数据有改变或者有新的数据时，神经网络模型可能需要更新。

2.如何只使用新数据或者结合新旧数据来对已训练网络模型进行更新；

3.如何创建一个包含已有模型和新模型的组合，在新数据或者新旧数据结合的数据集上进行训练、

让我们开始吧。

课程概述：

课程分为三部分：

1.更新神经网络模型。

2.重新训练更新策略

1.仅在新数据上更新模型

2.在新旧数据上更新模型

3.集成更新策略

1.仅在新数据上集成模型

2.在新数据和旧数据上集成模型

更新神经网络模型

为预测建模项目选择和完成深度学习神经网络模型仅仅是个开始。

然后，您可以开始使用模型对新数据进行预测。

你可能会面对的一个问题是预测问题的性质随着时间有改变。

可能你会注意到预测的有效性随着时间衰减。这是因为模型中使用和捕捉的假设在改变或不再有效。

通常，我们将这个问题称作“概念漂移”，即变量的可能性分布或变量间的关系随着时间变化，这可能会对基于数据的模型造成负面影响。

更多有关概念飘逸的知识，参见课程：

A Gentle Introduction to Concept Drift in Machine Learning

概念漂移可能会在不同时间影响模型，并取决于你要解决的具体预测问题和选择使用的模型。

通过监视模型性能一段时间并把模型性能的显著下降作为改变模型的触发，例如对新数据进行训练，是有帮助的。

或者，你可能知道在你的域中数据非常频繁变化，使得模型的改变应该是周期性的，例如以周、月甚至年为周期。

最后，你可能已经使用了一段时间模型，并且通过已知结果积累了额外的数据，因此希望用这些来更新模型，以期改善预测性能。

重要的是，在面对问题改变或者获得新数据时，你有很大的灵活性。

例如，你可以使用已训练的模型并用新数据更新网络权重。或者不改变已有模型，而是将它的预测和新的模型结合起来用于满足新数据。

这些对新数据进行响应的方法基本可以归为两类：

-重新训练更新策略

-集成更新策略

我们逐个深入了解。

重新训练更新策略

神经网络模型的好处之一是他们的权重可以在连续训练的任何时间进行更新。

当所使用的数据变化或者得到新数据的时候，对于如何更新神经网络模型有很多种方法，例如：

-只在新数据上继续训练模型。

-在新数据和旧数据上继续训练模型

我们可能会想到上述策略的各种变体，例如使用新数据中的采样，或者是新旧数据一起的采样，而不是使用所有数据。或者是根据采样数据进行基于实例的加权。

我们也会考虑冻结已有的模型（例如，使得模型权重在训练中不能更改），然后添加权重可变的模型层来拓展模型，使用这种嫁接的方式来应对数据的变化。这应该是重新训练和集成方法的变体，因此我们暂且不论，留到下一节讨论。

尽管如此，这就是两个要考虑的主要方法。

让我们用一些可行的例子来看待这些方法。

只使用新数据更新模型

我们可以只使用新数据更新模型。

这种方法的一个极端方式是不使用任何新的数据，只利用旧数据来重新训练模型。这和面对新数据什么也不做是一样的。另一个极端，则是使模型只能适应新数据，而丢掉旧的数据与模型。

-忽视新数据，什么也不做。

-在新数据上更新现有模型。

-在新数据上拟合新模型，丢掉旧模型和数据。

我们只关注中间着一种方式，但是你可以尝试这三种办法，并看看哪一个在你的问题处理中最有效。

首先，我们定义一个合成二进制分类数据集并将其分为两部分，一部分作为“旧数据”，一部分作为“新数据”。

# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

我们可以定义一个多层感知机(MLP)并且仅在旧数据上进行拟合。

# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

接下来试想我们保存了模型并且使用了它。

一段时间过去，我们希望能够使用新获得的数据更新模型。

这需要我们使用一个比平时小得多的学习率因为我们并不想把旧数据上学习得到的权重完全冲洗掉。

注意：你需要发现一个更适合你的模型与数据集的学习率来得到更好的性能，而不是简单的仅仅凑出一个新模型。

# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

然后我们就可以只使用更小的学习率来让模型拟合新数据。

model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

把这些放到一起，下面给出一个仅在新数据上更新神经网络模型的示例。

# update neural network with new data only
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
 
# save model...
 
# load model...
 
# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

接下来，让我们看看如何在新数据和旧数据上更新模型。

使用新数据和旧数据更新模型

我们可以结合新旧数据来更新模型。

一个极端的版本是丢掉现有模型，使用一个全新的模型在包括新旧数据的所有可用数据上。一个不那么极端的方式是将现有模型视为起点，基于此在联合的数据集上进行更新。

再次声明，在你的数据集上测试两种方法并找出哪个有效是一件好事。

我们则在例子中只关注不那么极端的方法。

在旧数据集上获得合成数据集和模型的方法和之前一样。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

得到新数据后，我们希望结合新的和旧的数据一起更新模型。

首先，我们必须使用一个小很多的学习率来尝试将现有的权重作为接下来研究的起点。

注意：你需要发现一个更适合你的模型与数据集的学习率来得到更好的性能，而不是简单的仅仅凑出一个新模型。

# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

接下来我们创建一个包含新旧数据的合成数据集。

# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

最后，我们在合成数据集上更新模型。

...
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

把这些合在一起，完整的在新旧数据上一起更新网络模型的例子代码如下。

# update neural network with both old and new data
from numpy import vstack
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
 
# save model...
 
# load model...
 
# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

接下来，让我们看看如何使用集成模型的方法应对新数据。

集成更新策略

所谓一个集成指的是有多个其他模型组成的预测模型。

集成模型的种类有很多，可能最简单的方法是将多个不同模型的预测结果取平均。

想学习更多的有关深度学习神经网络集成模型的算法，请参见教程：

Ensemble Learning Methods for Deep Learning Neural Networks - MachineLearningMastery.com

我们可以把集成模型当作一种针对数据变化或者新数据加入的方法。

根据前文，我们可以把用于适应新数据的集成学习算法分为两类，他们是：

-仅在新数据上集成现有模型和新模型。

-在新旧数据上集成现有模型和新模型。

同样，我们会考虑这些方法的变体，例如在新旧数据上进行采样，或者在集成中包含超过一个现有模型或增加模型的方法。

不过，主要策略还是两种。

用一些例子来看看这些方法。

仅在新数据上集成模型

我们可以创建一个包含现有模型和新模型的集合，只用于新数据。

预期是继承的模型在预测表现上强于只使用旧模型或只是用新模型，或者能更稳定（更小的偏差）。在实现前首先检查你的数据集。

首先，和前边章节一样准备数据集，来用于旧模型。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

一段时间过去，获得了新数据。

我们可以将新模型用于拟合新数据，自然会发现一个仅在新数据集上性能更好甚至最好的模型与参数设置。

在这个例子中，我们就是用和之前旧模型一样的模型结构和设置。

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

然后仅在新数据上用新模型拟合。

...
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

现在我们有了两个模型，我们可以用两个模型分别预测，然后计算两个预测结果的均值作为“集成预测”。

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

把上述合并，得到一个仅在新数据上集成现有模型和新模型的完整代码。

# ensemble old neural network with new model fit on new data only
from numpy import hstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
 
# save model...
 
# load model...
 
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)
 
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

在新数据和旧数据上集成模型

我们可以创建一个集成了已有模型和新模型来拟合新旧数据的集成模型。

预期是集成预测比单独使用新模型或旧模型在表现上更好或是更稳定（偏差更小）。在集成前需要检查你的数据集。

首先，和前问一样准备数据集和拟合的旧模型。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

过了一段时间，获得了新的数据。

我们接下来在新旧数据合成的数据集上拟合新模型，自然会得到一个在新数据集上效果更好的模型和配置。

在本例中，我们就使用和旧模型结构配置一样的模型。

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

我们可以创建一个包含新旧数据的数据集，在该数据集上拟合新模型。

...
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

最后，我们一起使用两个模型，然后得到集成预测。

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

综上所述，一个完整的集成现有模型和新模型，用于拟合新旧数据的示例代码如下。

# ensemble old neural network with new model fit on old and new data
from numpy import hstack
from numpy import vstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
 
# save model...
 
# load model...
 
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)
 
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

总结

在本课程，你学习了如何针对新数据来更新深度学习神经网络模型。

具体而言，你学习了：

-当数据有变化或者有新的标签数据可用时，应该更新神经网络模型。

-如何仅用新数据或者结合新旧数据来更新已经训练的神经网络。

-如何仅在新数据或者结合新旧数据来集成已有模型和新模型。

weixin_44842237

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
如何用更多数据更新神经网络模型

有很多方式来更新神经网络模型，尽管主要方法基本都只包含以下两种方式：要么将已有模型设置为起点重新训练，要么保持已有模型不变向模型中添加新模型用来联合预测。这可能是因为自开发和部署模型以来数据已发生更改，或者是自模型设置后提供了更多有标签数据的情况，并且这些附加数据预期能改善模型的性能。使用新数据更新神经网络模型时，特别在模型自动更新（例如使用周期性时间表）时，通过不同方法实验和测试非常重要。3.如何创建一个包含已有模型和新模型的组合，在新数据或者新旧数据结合的数据集上进行训练、1.仅在新数据上更新模型。
复制链接

扫一扫