bootstrap应用——以Boston住房数据集为例

本文链接：https://blog.csdn.net/2301_76574743/article/details/143478611

Chapter 5: Exercise 9

Bootstrap应用——以Boston住房数据集为例

一、导入数据集

library(MASS)
summary(Boston)

##       crim             zn            indus            chas       
##  Min.   : 0.01   Min.   :  0.0   Min.   : 0.46   Min.   :0.0000  
##  1st Qu.: 0.08   1st Qu.:  0.0   1st Qu.: 5.19   1st Qu.:0.0000  
##  Median : 0.26   Median :  0.0   Median : 9.69   Median :0.0000  
##  Mean   : 3.61   Mean   : 11.4   Mean   :11.14   Mean   :0.0692  
##  3rd Qu.: 3.68   3rd Qu.: 12.5   3rd Qu.:18.10   3rd Qu.:0.0000  
##  Max.   :88.98   Max.   :100.0   Max.   :27.74   Max.   :1.0000  
##       nox              rm            age             dis       
##  Min.   :0.385   Min.   :3.56   Min.   :  2.9   Min.   : 1.13  
##  1st Qu.:0.449   1st Qu.:5.89   1st Qu.: 45.0   1st Qu.: 2.10  
##  Median :0.538   Median :6.21   Median : 77.5   Median : 3.21  
##  Mean   :0.555   Mean   :6.29   Mean   : 68.6   Mean   : 3.79  
##  3rd Qu.:0.624   3rd Qu.:6.62   3rd Qu.: 94.1   3rd Qu.: 5.19  
##  Max.   :0.871   Max.   :8.78   Max.   :100.0   Max.   :12.13  
##       rad             tax         ptratio         black      
##  Min.   : 1.00   Min.   :187   Min.   :12.6   Min.   :  0.3  
##  1st Qu.: 4.00   1st Qu.:279   1st Qu.:17.4   1st Qu.:375.4  
##  Median : 5.00   Median :330   Median :19.1   Median :391.4  
##  Mean   : 9.55   Mean   :408   Mean   :18.5   Mean   :356.7  
##  3rd Qu.:24.00   3rd Qu.:666   3rd Qu.:20.2   3rd Qu.:396.2  
##  Max.   :24.00   Max.   :711   Max.   :22.0   Max.   :396.9  
##      lstat            medv     
##  Min.   : 1.73   Min.   : 5.0  
##  1st Qu.: 6.95   1st Qu.:17.0  
##  Median :11.36   Median :21.2  
##  Mean   :12.65   Mean   :22.5  
##  3rd Qu.:16.95   3rd Qu.:25.0  
##  Max.   :37.97   Max.   :50.0

二、设置随机种子，保证输出结果一致

set.seed(1)
attach(Boston)

三、问题求解

a、对medv的总体均值的估计 $\hat{\mu}$

medv.mean = mean(medv)
medv.mean

## [1] 22.53281

$\hat{\mu}=[1] 22.53281$

b、 $SE_{\hat{\mu}}$ 的估计，并解释这个结果

medv.err = sd(medv)/sqrt(length(medv))
medv.err

## [1] 0.4088611

$\hat{SE_{\hat{\mu}}}=0.4088611$ ，使用样本的标准差除以观测的平方根来计算样本均值的标准误差。

c、使用bootstrap法计算 $SE_{\hat{\mu}}$

boot.fn = function(data, index) return(mean(data[index]))
library(boot)
bstrap = boot(medv, boot.fn, 1000)
bstrap

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = medv, statistic = boot.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##          original              bias      std. error
## t1* 22.53281 -0.02520692   0.4049032

(0.4049 vs 0.4089)，发现通过bootstrap得出的结果与（b）中得到的结果几乎相等，差异较少。

d、给出medv均值的95%的置信区间，bootstrap与t.test(Boston$medv)法进行比较。

way1：t.test(Boston$medv)

t.test(Boston$medv)

## 
## 	One Sample t-test
## 
## data:  medv
## t = 55.111, df = 505, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  21.72953  23.33608
## sample estimates:
## mean of x 
##    22.53281

way2：bootstrap法

c(bstrap$t0 - 2 * 0.4119, bstrap$t0 + 2 * 0.4119)

## [1]  21.70901   23.35661

bootstrap： 21.73~23.34；
t.test: 21.71~23.36；
bootstrap法估计值与 t.test 估计值仅相差 0.02。

e、给出medv总体中位数的估计 $\hat{\mu}_{med}$

medv.med = median(medv)
medv.med

## [1] 21.2

$\hat{\mu}_{med}=21.2$

f、bootstrap法估计 $SE_{\hat{\mu}_{med}}$

boot.fn = function(data, index) return(median(data[index]))
boot(medv, boot.fn, 1000)

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = medv, statistic = boot.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##       original           bias          std. error
##  t1*     21.2   -0.02395        0.3820469

中位数为 21.2，SE 为 0.382。与总体均值相比，标准误差较小。

g、计算Boston郊区的medv的10%分位数的估计 $\hat{\mu}_{0.1}$

medv.tenth = quantile(medv, c(0.1))
medv.tenth
#计算标准误差，标准误差=样本标准差/观测的平方根
medv.tenth.err = sd(medv) / sqrt(length(medv))
medv.tenth.err

##   10% 
## 12.75
## 0.4088611

$\hat{\mu}_{0.1}=12.75$ ， $\hat{SE_{\hat{\mu}_{0.1}}}=0.409$

h、bootstrap法估计 $\hat{\mu}_{0.1}$

boot.fn = function(data, index) return(quantile(data[index], c(0.1)))
boot(medv, boot.fn, 1000)

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = medv, statistic = boot.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##         original        bias       std. error
## t1*      12.75    0.0311    0.5063093

$\hat{\mu}_{0.1}=12.75$ ，SE 为 0.506，估计的标准误差较小,但是与（g）的结果相比，误差稍稍较大。