k值交叉验证 交叉验证集
Cross-Validation also referred to as out of sampling technique is an essential element of a data science project. It is a resampling procedure used to evaluate machine learning models and access how the model will perform for an independent test dataset.
交叉验证(也称为“过采样”技术)是数据科学项目的基本要素。 它是一种重采样过程,用于评估机器学习模型并访问该模型对独立测试数据集的性能。
In this article, you can read about 8 different cross-validation techniques having their pros and cons, listed below:
在本文中,您可以阅读以下大约8种不同的交叉验证技术,各有其优缺点:
Leave p out cross-validation
省略p交叉验证
Leave one out cross-validation
留出一个交叉验证
Holdout cross-validation
保持交叉验证
Repeated random subsampling validation
重复随机二次抽样验证
k-fold cross-validation
k折交叉验证
Stratified k-fold cross-validation
分层k折交叉验证
Time Series cross-validation
时间序列交叉验证
Nested cross-validation
嵌套交叉验证
Before coming to cross-validation techniques let us know why cross-validation should be used in a data science project.
在介绍交叉验证技术之前,让我们知道为什么在数据科学项目中应使用交叉验证。
为什么交叉验证很重要? (Why Cross-Validation is Important?)
We often randomly split the dataset into train data and test data to develop a machine learning model. The training data is used to train the ML model and the same model is tested on independent testing data to evaluate the performance of the model.
我们经常将数据集随机分为训练数据和测试数据,以开发机器学习模型。 训练数据用于训练ML模型,同一模型在独立的测试数据上进行测试以评估模型的性能。
With the change in the random state of the split, the accuracy of the model also changes, so we are not able to achieve a fixed accuracy for the model. The testing data should be kept independent of the training data so that no data leakage occurs. During the development of an ML model using the training data, the model performance needs to be evaluated. Here’s the importance of cross-validation data comes into the picture.
随着分裂随机状态的变化,模型的准确性也会发生变化,因此我们无法为模型获得固定的准确性。 测试数据应与训练数据无关,以免发生数据泄漏。 在使用训练数据开发ML模型的过程中,需要评估模型的性能。 这就是交叉验证数据的重要性。
Data needs to split into:
数据需要分为:
Training data: Used for model development
训练数据:用于模型开发
Validation data: Used for validating the performance of the same model
验证数据:用于验证相同模型的性能
![Image for post](https://img-blog.csdnimg.cn/img_convert/0b90b0d78ed0181c0748aaaea37f840e.png)
In simple terms cross-validation allows us to utilize our data even better. You can further read, working, and implementation of 7 types of Cross-Validation techniques.
简单来说,交叉验证使我们可以更好地利用我们的数据。 您可以进一步阅读,使用和实施7种类型的交叉验证技术。
1.保留p-out交叉验证: (1. Leave p-out cross-validation:)
Leave p-out cross-validation (LpOCV) is an exhaustive cross-validation technique, that involves using p-observation as validation data, and remaining data is used to train the model. This is repeated in all ways to cut the original sample on a val