交叉验证和超参数调整：如何优化您的机器学习模型-CSDN博客

本文探讨了仅使用一个验证集的不足，介绍了如何通过交叉验证和超参数调整来优化随机森林、极端梯度提升等机器学习模型的性能，以提升预测效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

In the first two parts of this article I obtained and preprocessed Fitbit sleep data, split the data into training, validation and test set, trained three different Machine Learning models and compared their performance.

在本文的前两部分中，我获得并预处理了Fitbit睡眠数据，将数据分为训练，验证和测试集，训练了三种不同的机器学习模型并比较了它们的性能。

In part 2, we saw that using the default hyperparameters for Random Forest and Extreme Gradient Boosting and evaluating model performance on the validation set led to Multiple Linear Regression performing best and Random Forest as well as Gradient Boosting Regressor performing slightly worse.

在第2部分中，我们看到将默认超参数用于Random Forest和Extreme Gradient Boosting并在验证集上评估模型性能会导致多元线性回归表现最佳，而Random Forest以及Gradient Boosting Regressor表现稍差。

In this part of the article I will discuss shortcomings of using only one validation set, how we address those shortcomings and how we can tune model hyperparameters to boost performance. Let’s dive in.

在本文的这一部分中，我将讨论仅使用一个验证集的缺点，我们如何解决这些缺点以及如何调整模型超参数以提高性能。让我们潜入。

交叉验证 (Cross-Validation)

简单培训，验证和测试拆分的缺点 (Shortcomings of simple training, validation and test split)

In part 2 of this article we split the data into training, validation and test set, trained our models on the training set and evaluated them on the validation set. We have not touched the test set yet as it is intended as a hold-out set that represents never before seen data that will be used to evaluate how well the Machine Learning models generalise once we feel like they are ready for that final test.

在本文的第2部分中，我们将数据分为训练集，验证集和测试集，在训练集上训练我们的模型，并在验证集上对其进行评估。我们尚未触及该测试集，因为它旨在作为一种保留集，表示从未见过的数据，一旦我们感觉它们已经准备好用于最终测试，它们将用于评估机器学习模型的概括程度。

Because we only split the data into one set of training data and one set of validation data, the performance metrics of our models are highly reliant on those two sets. They are only trained and evaluated once so the performance depends on that one evaluation and may perform very differently when trained and evaluated on different subsets of the same data, just because of the nature of how the subsets are picked.

因为我们仅将数据分为一组训练数据和一组验证数据，所以我们模型的性能指标高度依赖于这两套数据。它们仅经过训练和评估一次，因此性能取决于该评估，并且由于对同一数据的不同子集进行训练和评估而导致的性能可能会大不相同。

What if we could do this split into training and validation test multiple times, each time on different subsets of the data, train and evaluate our models each time and look at the average performance of the models across multiple evaluations? Exactly that is the idea behind K-fold Cross-Validation.

如果我们可以多次对数据的不同子集进行多次训练和验证测试，然后每次对模型进行训练和评估，并查看多次评估中模型的平均性能，该怎么办？恰恰是K折交叉验证背后的想法。

K折交叉验证 (K-fold Cross-Validation)

In K-fold Cross-Validation (CV) we still start off by separating a test/hold-out set from the remaining data in the data set to use for the final evaluation of our models. The data that is remaining, i.e. everything apart from the test set, is split into K number of folds (subsets). The Cross-Validation then iterates through the folds and at each iteration uses one of the K folds as the validation set while using all remaining folds as the training set. This process is repeated until every fold has been used as a validation set. Here is what this process looks like for a 5-fold Cross-Validation:

在K折交叉验证(CV)中，我们仍然从将测试/保持集与数据集中的其余数据中分离出来以用于模型的最终评估开始。剩余的数据(即除测试集以外的所有数据)被分为K个折叠(子集)数。然后，交叉验证会遍历折痕，并且在每次迭代时，将K折痕之一用作验证集，而将所有其余折痕用作训练集。重复此过程，直到所有折痕都用作验证集为止。这是5倍交叉验证的过程：

By training and testing the model K number of times on different subsets of the same training data we get a more accurate representation of how well our model might perform on data it has not seen before. In a K-fold CV we score the model after every iteration and compute the average of all scores to get a better representation of how the model performs comp