【读书1】【2017】MATLAB与深度学习——正视过度拟合(1)

最新推荐文章于 2023-12-22 19:39:20 发布

梅花香——苦寒来

最新推荐文章于 2023-12-22 19:39:20 发布

阅读量306

点赞数

我们将在第三章的“代价函数和学习规则”部分进一步详述正则化的相关内容。

We will revisit regularization with furtherdetails in Chapter Three’s “Cost Function and Learning Rule” section.

在前面的数据分组例子中，由于训练数据简单，而且模型易于可视化，因此我们可以看出分组模型已经过度拟合。

We are able to tell that the grouping modelis overfitted because the training data is simple, and the model can be easilyvisualized.

然而，对于大多数情况，情况并非如此，因为被处理的数据具有更高的维度。

However, this is not the case for mostsituations, as the data has higher dimensions.

对于高维度的数据，我们无法绘制模型并直观地评估过度拟合的影响。

We cannot draw the model and intuitivelyevaluate the effects of overfitting for such data.

因此，我们需要另一种方法来确定训练过的模型是否被过度拟合。

Therefore, we need another method todetermine whether the trained model is overfitted or not.

这就是验证方法发挥作用的地方。

This is where validation comes into play.

验证是保留训练数据的一部分并使用它来监视模型性能的过程。

The validation is a process that reserves apart of the training data and uses it to monitor the performance.

验证数据不用于训练过程。

The validation set is not used for thetraining process.

因为训练数据的建模误差不能用于表明数据的过度拟合，所以我们使用训练数据中的一部分来检查模型是否过度拟合。

Because the modeling error of the trainingdata fails to indicate overfitting, we use some of the training data to checkif the model is overfitted.

我们可以说，当训练模型对保留的数据输入产生低性能时，模型被过度拟合。

We can say that the model is overfittedwhen the trained model yields a low level of performance to the reserved datainput.

在这种情况下，我们将修改模型，以防止过度拟合。

In this case, we will modify the model toprevent the overfitting.

图1-10示出了验证过程中训练数据的划分。

Figure 1-10 illustrates the division of thetraining data for the validation process.

这里写图片描述
图1-10 为验证过程划分训练数据集Dividing the trainingdata for the validation process

当涉及到验证时，机器学习的训练过程通过以下步骤进行：

When validation is involved, the trainingprocess of Machine Learning proceeds by the following steps:

将训练数据分成两组：一组用于训练，另一组用于验证。
Divide thetraining data into two groups: one for training and the other for validation.

作为应用上的经验法则，训练集与验证集的比率是8:2。

As a rule of thumb, the ratio of thetraining set to the validation set is 8:2.

用训练集训练模型。
Train the model with the training set.
使用验证集来评估模型的性能。
Evaluate the performance of the modelusing the validation set.

a. 如果模型得到满意的性能，则完成训练。

a. If the model yields satisfactoryperformance, finish the training.

b. 如果性能没有得到满意的结果，则修改模型，从步骤2重复以上过程。

b. If theperformance does not produce sufficient results, modify the model and repeatthe process from Step 2.

交叉验证是一种轻微变化的验证过程。

Cross-validation is a slight variation ofthe validation process.

它仍然将训练数据分成两组分别进行训练和验证，但是不断改变数据集。

It still divides the training data intogroups for the training and validation, but keeps changing the datasets.

交叉验证不保留最初划分的集合，而是重复数据的划分。

Instead of retaining the initially dividedsets, cross-validation repeats the division of the data.

这样做的原因是，即使在验证数据集被固定时，模型也可以能被过度拟合。

The reason for doing this is that the modelcan be overfitted even to the validation set when it is fixed.

——本文译自Phil Kim所著的《Matlab Deep Learning》

更多精彩文章请关注微信号：这里写图片描述

梅花香——苦寒来

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫