ISLR第五章-重采样方法

最新推荐文章于 2024-05-06 17:36:49 发布

Half0pen

最新推荐文章于 2024-05-06 17:36:49 发布

阅读量4.7k

点赞数

分类专栏： other

本文链接：https://blog.csdn.net/Half_open/article/details/61196501

版权

5 重采样方法

In this chapter, we discuss two of the most commonly used resampling methods, cross-validation（交叉验证） and the bootstrap（自助法）.

The process
of evaluating a model’s performance is known as model assessment(模型评估), whereas
the process of selecting the proper level of flexibility for a model is known as
model selection（模型选择）.

5.1 Cross-Validation

In this section, we instead consider a class of methods that estimate the
test error rate by holding out a subset of the training observations from the
fitting process, and then applying the statistical learning method to those
held out observations.

5.1.1 The Validation Set Approach

随机分为两部分

这里写图片描述

从左图可以看出，当由1次变为2次时，均方差减少明显，之后次数再增加均方差减少不明显，甚至还有上升。
从右图可以看出，不同验证集的选择对均方差影响很大。

优点：原理简单，便于实施
缺点：

由于验证集选择的不同，测试错误率的波动很大
只有一部分数据被用于训练拟合模型，测试错误率可能被高估

5.1.2 Leave-One-Out Cross-Validation（LOOCV）

LOOCV

n个测试数据，取其中一个当验证集，剩下的n-1做为训练集。
最多可重复拟合模型n次。

这里写图片描述

克服了The Validation Set Approach的缺点，但这个方法计算量很大。

用最小二乘法来拟合模型时，LOOCV所用时间可以缩减到和只拟合一个模型相同？？

5.1.3 k-Fold Cross-Validation

把观测集随机分成大小差不多一致的组，取一组做为验证集。LOOCV是k=n时的一个特例。
这里写图片描述

LOOCV has higher variance, but lower bias, than k-fold CV

5.2 The Bootstrap

The bootstrap is a widely applicable and extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method.

我的理解是有放回地随机抽样。比如1，2，3要抽出5个样本可以是1 2 2 3 3，抽出三个可以是1 2 2。

这里举了一个例子，对两个收益分别为X和Y的资产进行投资，X占 $\alpha$ ,Y占 $1-\alpha$ ,所以我们希望找出一个 $\alpha$ ，使得 $Var(\alpha X+(1-\alpha )Y)$ 最小。这个 $\alpha$ 值为