train, validation and test
通常分为train和test,在train中分出一份validation,第二步骤重复n次(通常取10)。
Generally splits are done like this:
a) Train
b) Test
Generally, the train data is then split in n parts. n−1 of them are used for training and remaining 1 is used for validation. And, this process is repeated until all the n parts become validation sets once.
out of sample and in sample
\ | out of sample data | in sample data |
---|---|---|
train | no | yes |
validation | no | yes |
test | yes | no |
in sample testing <-> purpose: high train accuracy
out of sample testing <-> purpose: high test accuracy
从监督学习看testing:
- 回归:分为训练集和测试集,(X, y)作为pair送入学习,得到函数或神经网络后,测试为input(X)-ouput(y)。因为通常加入正则项,因此训练准确率不会是百分百,但误差项会达到最小,计算error项就是一个in sample testing;利用得到的模型,预测训练集中没有的y值,就是out of sample testing,可能有`部分``X在训练集中出现过。
- 分类:(X, label)作为pair送入学习,测试为input(X)-ouput(label)。计算error项就是一个in sample testing;利用得到的模型,预测测试集label,就是out of sample testing