k值交叉验证 交叉验证集_了解8种交叉验证类型

k值交叉验证 交叉验证集Cross-Validation also referred to as out of sampling technique is an essential element of a data science project. It is a resampling procedure used to evaluate machine learning models and ...
摘要由CSDN通过智能技术生成

k值交叉验证 交叉验证集

Cross-Validation also referred to as out of sampling technique is an essential element of a data science project. It is a resampling procedure used to evaluate machine learning models and access how the model will perform for an independent test dataset.

交叉验证(也称为“过采样”技术)是数据科学项目的基本要素。 它是一种重采样过程,用于评估机器学习模型并访问该模型对独立测试数据集的性能。

In this article, you can read about 8 different cross-validation techniques having their pros and cons, listed below:

在本文中,您可以阅读以下大约8种不同的交叉验证技术,各有其优缺点:

  1. Leave p out cross-validation

    省略p交叉验证

  2. Leave one out cross-validation

    留出一个交叉验证

  3. Holdout cross-validation

    保持交叉验证

  4. Repeated random subsampling validation

    重复随机二次抽样验证

  5. k-fold cross-validation

    k折交叉验证

  6. Stratified k-fold cross-validation

    分层k折交叉验证

  7. Time Series cross-validation

    时间序列交叉验证

  8. Nested cross-validation

    嵌套交叉验证

Before coming to cross-validation techniques let us know why cross-validation should be used in a data science project.

在介绍交叉验证技术之前,让我们知道为什么在数据科学项目中应使用交叉验证。

为什么交叉验证很重要? (Why Cross-Validation is Important?)

We often randomly split the dataset into train data and test data to develop a machine learning model. The training data is used to train the ML model and the same model is tested on independent testing data to evaluate the performance of the model.

我们经常将数据集随机分为训练数据和测试数据,以开发机器学习模型。 训练数据用于训练ML模型,同一模型在独立的测试数据上进行测试以评估模型的性能。

With the change in the random state of the split, the accuracy of the model also changes, so we are not able to achieve a fixed accuracy for the model. The testing data should be kept independent of the training data so that no data leakage occurs. During the development of an ML model using the training data, the model performance needs to be evaluated. Here’s the importance of cross-validation data comes into the picture.

随着分裂随机状态的变化,模型的准确性也会发生变化,因此我们无法为模型获得固定的准确性。 测试数据应与训练数据无关,以免发生数据泄漏。 在使用训练数据开发ML模型的过程中,需要评估模型的性能。 这就是交叉验证数据的重要性。

Data needs to split into:

数据需要分为:

  • Training data: Used for model development

    训练数据:用于模型开发

  • Validation data: Used for validating the performance of the same model

    验证数据:用于验证相同模型的性能

Image for post
(Image by Author), Validation split
(作者提供的图像),验证拆分

In simple terms cross-validation allows us to utilize our data even better. You can further read, working, and implementation of 7 types of Cross-Validation techniques.

简单来说,交叉验证使我们可以更好地利用我们的数据。 您可以进一步阅读,使用和实施7种类型的交叉验证技术。

1.保留p-out交叉验证: (1. Leave p-out cross-validation:)

Leave p-out cross-validation (LpOCV) is an exhaustive cross-validation technique, that involves using p-observation as validation data, and remaining data is used to train the model. This is repeated in all ways to cut the original sample on a val

  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值