k值交叉验证交叉验证集_如何掌握交叉验证

最新推荐文章于 2023-12-04 11:20:59 发布

cumian9828

最新推荐文章于 2023-12-04 11:20:59 发布

阅读量1.9k

点赞数

文章标签：大数据机器学习人工智能深度学习 python

原文链接：https://www.freecodecamp.org/news/how-to-get-a-grip-on-cross-validations-bb0ba779e21c/

版权

k值交叉验证交叉验证集

by Shruti Tanwar

通过Shruti Tanwar

如何掌握交叉验证 (How to get a grip on Cross Validations)

Lately, I’ve had the chance to be involved in building a product that aims at accelerating ML/AI (Machine Learning / Artificial Intelligence) adoption for businesses. In the process of developing such an exciting product, I learned a thing or two along the way.

最近，我有机会参与开发旨在加速企业采用ML / AI(机器学习/人工智能)的产品。在开发如此令人兴奋的产品的过程中，我一路上学到了一两件事。

And although ML/AI is too big of an umbrella to be covered in a single article, I’m taking this chance to brighten the light on one of the concepts which will help you in building out a resilient predictive model. A model which is capable of performing reliably in the real-world, and behaves ‘fairly’ on unseen data.

而且，尽管ML / AI太大了，无法在一篇文章中涵盖，但我还是借此机会来阐明其中一个概念，这将帮助您建立弹性的预测模型。一种模型，能够在现实世界中可靠地执行，并且在看不见的数据上表现出“公平”的行为。

You can never be a 100% sure about your machine learning model’s behavior. There is always room for improvement, or progress or a certain tweak ?. Merely fitting the model to your training data and expecting it to perform accurately in the real world, would be a poor choice to make. Certain factors that can guarantee or at least assure you of reasonable performance need to be considered before deploying the model to production.

您永远不可能对机器学习模型的行为有100％的把握。总是有改进的空间，还是有进步或一定的调整？仅将模型拟合到您的训练数据并期望它在现实世界中能够准确执行，将是一个糟糕的选择。在将模型部署到生产之前，需要考虑某些可以保证或至少确保您具有合理性能的因素。

You need to make sure that your model has an understanding of different patterns in your data — is not under-fit or over-fit — and the bias and variance for the model are on the lower end.

您需要确保您的模型对数据中的不同模式有所了解，而不是过度拟合或过度拟合，并且模型的偏差和方差在较低端。

“Cross-Validation” ✔ is the technique which helps you validate your model’s performance. It’s a statistical method used to estimate the skill of machine learning models. Wikipedia defines it as follows.

“ 交叉验证 ”✔是帮助您验证模型性能的技术。这是一种统计方法，用于估计机器学习模型的技能。维基百科对它的定义如下。

Cross-validation, sometimes called rotation estimation, or out-of-sample testing is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.

交叉验证 (有时称为旋转估计或样本外测试)是各种类似的模型验证技术中的任何一种，用于评估统计分析的结果将如何概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。

In extremely simple words, the practical implementation of the above jargon would be as follows:

用非常简单的话来说，上述术语的实际实现如下：

While training a model, some of the data is removed before training begins. Upon completion of training, the data that was removed is used to test the performance of the learned model and tweak the parameters to improve the final performance of the model.

训练模型时，在训练开始之前会删除一些数据。训练完成后，删除的数据将用于测试学习到的模型的性能，并调整参数以提高模型的最终性能。

This is the fundamental idea for the whole spectrum of evaluation methods called cross-validation.

这是称为交叉验证的整个评估方法的基本思想。

Before discussing the validation techniques though, let us take a quick look at two terms used above. Over-fit and under-fit. What exactly is under-fitting and over-fitting of models and how does it affect the performance of a model with real-world data?

在讨论验证技术之前，让我们快速看一下上面使用的两个术语。过度拟合和不足拟合。什么是模型的过度拟合和过度拟合，以及它如何影响具有实际数据的模型的性能？

We can understand it easily through the following graph.

通过下图我们可以很容易地理解它。

A model is said to be underfitting (High Bias) when it performs poorly on the training data. As we can see in the graph on the left, the line doesn’t cover most of the data points on the graph meaning it has been unable to capture the relationship between the input (say X), and the output to be predicted (say Y).

模型被认为是欠拟合 (高偏差) 当它在训练数据上表现不佳时。正如我们在左侧图表中看到的那样，该行并未覆盖图表上的大多数数据点，这意味着它无法捕获输入(例如X )和要预测的输出(例如Y )。

An overfitting model, (High Variance) on the other hand, performs well on the training data but does not perform well on the evaluation data. In such a case, the model is memorizing the data it has seen instead of learning and is unable to generalize to unseen data.

过度拟合 另一方面，(高方差) 在训练数据上表现不佳，但在评估数据上表现不佳。在这种情况下，该模型存储的是已看到的数据，而不是学习的数据，无法概括为看不见的数据。

The graph on the right represents the case of over-fitting. We see that the predicted line is covering all the data points in the graph. Though it might seem like this should make the model work even better, sadly, that’s far from the practical truth. The predicted line covering all points which also includes noise and outliers produces poor results due to its complexity.

右图代表过度拟合的情况。我们看到预测线覆盖了图中的所有数据点。尽管看起来这应该会使模型更好地工作，但遗憾的是，这与实际情况相去甚远。由于其复杂性，覆盖所有点(还包括噪声和异常值)的预测线会产生较差的结果。

Let’s move on to the various types of cross-validation techniques out there.

让我们继续进行各种类型的交叉验证技术。

保持方法 (Holdout Method)

The simplest type of cross-validation. Here, the data set is separated into two sets, called the training set and the testing set. The model is allowed to fit only on the training set. Then the predictions are made for the data in the testing set (which the model has never seen before). The errors it makes are aggregated to give the mean absolute test set error, which is used to evaluate the model.

交叉验证的最简单类型。在这里，数据集分为两组，分别称为训练集和测试集。该模型仅适用于训练集。然后对测试集中的数据进行预测(模型从未见过)。汇总其产生的误差以给出平均绝对测试设置误差，该误差用于评估模型。

This type of evaluation to an extent is dependent on which data points end up in the training set and which end up in the test set, and thus might affect the evaluation depending on how the division is made.

这种评估在一定程度上取决于训练集中的数据点和测试集中的数据点，因此可能会取决于划分方式而影响评估。

K折交叉验证 (K-fold cross-validation)

One of the most popular validation techniques is the K-fold cross-validation. This is due to its simplicity which generally produces less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split.

最受欢迎的验证技术之一是K折交叉验证。这是由于其简单性，与其他方法(例如简单的训练/测试拆分)相比，该方法通常对模型技能产生较少的偏见或较不乐观的估计。

Here, the data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets constitute the training set. Then the average error is computed across all k trials.

在此，将数据集划分为k个子集，并将保留方法重复k次。每次，将k个子集之一用作测试集，而其他k-1个子集构成训练集。然后，在所有k试验中计算平均误差。

The general procedure is as follows:

一般步骤如下：

Shuffle the dataset randomly and split it into k groups
随机洗净数据集并将其分成k组
Take one group as a holdout or test data set and the remaining groups as training data set.
将一组作为保持或测试数据集，将其余组作为训练数据集。
Fit a model on the training set and evaluate it on the test set.
在训练集上拟合模型并在测试集上对其进行评估。
Retain the evaluation score and discard the model.
保留评估分数并丢弃模型。
Summarize the skill of the model using the sample of model evaluation scores.
使用模型评估分数的样本来总结模型的技能。

The edge this method has over others is that it matters little how the data gets divided. Every data point will get to be in a test set exactly once and will get into training set exactly k-1 times. As k is increased, we see a fall in the variance of the resulting estimate.

该方法相对于其他方法的优势在于，如何划分数据无关紧要。每个数据点将被准确地放入测试集中一次，并将被精确地进入k-1次训练集中。随着k的增加，我们看到结果估计的方差下降。

One disadvantage of this method can be the computation required during the training. The training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation.

这种方法的一个缺点是训练期间需要进行计算。训练算法必须从头开始重新运行k次，这意味着需要k倍的计算才能进行评估。

留一法交叉验证 (Leave-one-out cross-validation)

Leave-one-out is sort of like a cousin to K-fold cross-validation where k becomes equal to n, the total number of data points in the set. It's basically a logical extreme version of K-fold validation. How it works practically is by leaving out exactly one data point in each iteration and using that data point to make the prediction.

一劳永逸有点像表亲对K折交叉验证，其中k等于n ，即集合中数据点的总数。它基本上是K折验证的逻辑极限版本。实际上，它的工作方式是在每次迭代中精确地保留一个数据点，然后使用该数据点进行预测。

The function approximator is trained on all the data, exactly n times, except for one point and a prediction is made for that point. As before, the average error is computed and used to evaluate the model.

除一个点外，函数逼近器在所有数据上均进行了n次训练，并对该点进行了预测。如前所述，计算平均误差并将其用于评估模型。

There we go and call it a wrap. Hope you enjoyed reading it as much I enjoyed creating it.❤️ Let me know your thoughts?, comments? or advice? in the comments below.And while you’re at it, why don’t you go and check out what I build with my team, at skyl.ai, and strike up a conversation with me or share your feedback. Cheers.

在那里，我们称之为包裹。希望您喜欢阅读它，就像我喜欢创造它一样。❤️让我知道您的想法吗？或建议？在您评论的同时，为什么不去看看我与我的团队合作的成果，在sky l.ai，与我进行对话或分享您的反馈。干杯。