How large a training set is needed?

Is there a common method used to determine how many training samples are required to train a classifier (an LDA in this case) to obtain a minimum threshold generalization accuracy?

I am asking because I would like to minimize the calibration time usually required in a brain-computer interface.

The search term you are looking for is "learning curve", which gives the (average) model performance as function of the training sample size.

Learning curves depend on a lot of things, e.g.

  • classification method
  • complexity of the classifier
  • how well the classes are separated.

(I think for two-class LDA you may be able to derive some theoretical power calculations, but the crucial fact is always whether your data actually meets the "equal COV multivariate normal" assumption. I'd go for some simulation on for both LDA assumptions and resampling of your already existing data).

There are two aspects of the performance of a classifier trained on a finite sample size nn (as usual),

  • bias, i.e. on average a classifier trained on nn training samples is worse than the classifier trained on n=∞n=∞ training cases (this is usually meant by learning curve), and
  • variance: a given training set of nn cases may lead to quite different model performance.
    Even with few cases, you may be lucky and get good results. Or you have bad luck and get a really bad classifier.
    As usual, this variance decreases with incresing training sample size nn.

Another aspect that you may need to take into account is that it is usually not enough to train a good classifier, but you also need to prove that the classifier is good (or good enough). So you need to plan also the sample size needed for validation with a given precision. If you need to give these results as fraction of successes among so many test cases (e.g. producer's or consumer's accuracy / precision / sensitivity / positive predictive value), and the underlying classification task is rather easy, this can need more independent cases than training of a good model.

As a rule of thumb, for training, the sample size is usually discussed in relation to model complexity (number of cases : number of variates), whereas absolute bounds on the test sample size can be given for a required precision of the performance measurement.

Here's a paper, where we explained these things in more detail, and also discuss how to constuct learning curves:
Beleites, C. and Neugebauer, U. and Bocklitz, T. and Krafft, C. and Popp, J.: Sample size planning for classification models. Anal Chim Acta, 2013, 760, 25-33.
DOI: 10.1016/j.aca.2012.11.007
accepted manuscript on arXiv: 1211.1323

This is the "teaser", showing an easy classification problem (we actually have one easy distinction like this in our classification problem, but other classes are far more difficult to distinguish):

We did not try to extrapolate to larger training sample sizes to determine how much more training cases are needed, because the test sample sizes are our bottleneck, and larger training sample sizes would let us construct more complex models, so extrapolation is questionable. For the kind of data sets I have, I'd approach this iteratively, measuring a bunch of new cases, showing how much things improved, measure more cases, and so on.

This may be different for you, but the paper contains literature references to papers using extrapolation to higher sample sizes in order to estimate the required number of samples.

Asking about training sample size implies you are going to hold back data for model validation. This is an unstable process requiring a huge sample size. Strong internal validation with the bootstrap is often preferred. If you choose that path you need to only compute the one sample size. As @cbeleites so nicely stated this is often an "events per candidate variable" assessment, but you need a minimum of 96 observations to accurately predict the probability of a binary outcome even if there are no features to be examined [this is to achieve of 0.95 confidence margin of error of 0.1 in estimating the actual marginal probability that Y=1].

It is important to consider proper scoring rules for accuracy assessment (e.g., Brier score and log likelihood/deviance). Also make sure you really want to classify observations as opposed to estimating membership probability. The latter is almost always more useful as it allows a gray zone.

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下 4载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下载 4使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合;、下载 4使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值