Introduction to Resampling

Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify of the estimate.Both data sampling and data resampling are methods that are required in a predictive modeling problem.

  • Sampling is an active process of gathering observations with the intent of estimating a population variable.
  • Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
  • Resampling methods,  make use of a nested resampling method.

1.1 Statistical Sampling

Observations made in a domain represent samples of some broader idealized and unknown population of all possible observation that could be made in the domain.

Sampling consists of selecting some part of the population to observe so that one may estimate something about the whole population.

1.1.1 How to Sample

Some aspects to consider prior to collecting a data sample include:

  • Sample Goal
  • Population
  • Selection Criteria
  • Sample Size.

Statistical sampling is a large field of study, but in applied machine learning , there may be three types of sampling that you are likely to use: simple random sampling, systematic sampling, and stratified sampling.

  • Simple Random Sampling : Samples are drawn with a uniform probability from the domain.
  • Systematic Sampling : Samples are drawn using a pre-specified pattern , such as at intervals
  • Stratified Sampling : Samples are drawn within pre-specified categories.

1.1.2 Sampling Errors

Two main types of errors include selection bias and sampling error.

Selection Bias: Caused when the method of drawing observations skews the sample in some way.

Sampling Error: Caused due to the random nature of drawing observations skewing the sample in some way.

1.1.3 Statistical Resampling

Statistical resampling methods are procedures that describe how to economically use available data to estimate a population parameter.Resampling methods are very easy to use.requiring little mathematical knowledege.They are methods that are easy to understand and implement compared to specialized statistical methods that may require deep technical skill in order to select and interpret.

Two commonly used resampling methods that you may encounter are k-fold cross-validation the bootstrap.

  • Bootstrap. Samples are drawn from the dataset with replacement.where those instances not drawn into the data sample may be used for the test set.
  • k-fold Cross-Validation. A dataset is partitioned into k groups, where each group is given the opportunity

The k-fold cross-validation method specifically lends itself to use in the evaluation of predictive models that are repeatedly trained on one subset of the data and evaluated on a second held-out subset of the data.

Generally, resampling techniques for estimating model performance operate similarly: a subset of samples are used to fit a model and the remaining samples are used to estimate the efficacy of the model. This process is repeated multiple times and the results are aggregated and summarized. The differences in techniques usually center around the method in which subsamples are chosen.

The bootstrap method can be used for the same purpose, but is a more general and simpler method intended for estimating a population parameter.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值