交叉验证

交叉验证(整理自维基百科)

在某次会议上…

“你是开集测试还是闭集测试?!”

“啥…”

来来来,捋捋都有哪些交叉验证方法吧!

Cross-validation

Two types of cross-validation can be distinguished, exhaustive and non-exhaustive cross-validation.

(交叉验证可分为,穷尽和非穷尽)

Exhaustive cross-validation(穷尽)

Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set.

Leave-p-out cross-validation

Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set.

(数据集 n 样本,留出 p 测试,n-p 训练,p 样本的选择需穷尽所有情况,计算量大)

Leave-one-out cross-validation(留一法)

Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation you compute a statistic on the left-out sample(s), while with jackknifing you compute a statistic from the kept samples only.

(数据集 n 样本,留出 1 测试,n-1 训练,共需训练、测试 n 次)

Non-exhaustive cross-validation(非穷尽)

Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. Those methods are approximations of leave-p-out cross-validation.

k-fold cross-validation(k折交叉验证)

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data.

(将数据集随机划分为 k 份,不一定等分,1 份测试,k-1 份测试,共需训练、测试 k 次)

(当数据集非等分时,计算最终测试的准确率不能直接对 k 次测试的准确率取均值;以分类任务为例,应先将 k 次测试中,分类正确的样本数求和,再除以总样本数)

(常使用多次十折交叉验证,例如 10 次 10 折交叉验证,即随机划分 10 次数据集到 10 份,共需训练、测试 100 次;选择十折并不是绝对的,只是某些统计测试中表明该选择最为适合)

Holdout method(简单交叉验证)

In the holdout method, we randomly assign data points to two sets d0 and d1, usually called the training set and the test set, respectively. The size of each of the sets is arbitrary although typically the test set is smaller than the training set. We then train on d0 and test on d1.

(将数据集划分为训练集和测试集,共需训练、测试 1 次)

Repeated random sub-sampling validation

This method, also known as Monte Carlo cross-validation.

(没深查,感兴趣的同学,了解一下?)

以上
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值