DataMining
文章平均质量分 76
DataMining2013
这个作者很懒,什么都没留下…
展开
-
5.2 PREDICTING PERFORMANCE
interval estimate(区间估计)(1)置信度 Confidence level(置信水平)=confidence coefficient(置信系数)(2)置信区间Confidence interval原创 2013-11-23 11:50:25 · 492 阅读 · 0 评论 -
5.3 CROSS-VALIDATION
Question:The sample(样本) used for training (or testing) might not be representative(代表性). If, by bad luck, all examples with a certain class were omitted fromthe training set, you could hardly ex原创 2013-11-23 11:26:42 · 628 阅读 · 0 评论 -
5.4 OTHER ESTIMATES
(1)Leave-One-Out Cross-Validation Each instance in turn is left out, and the learningscheme is trained on all the remaining instances. It is judged by its correctness on the remaining翻译 2013-11-23 16:10:55 · 549 阅读 · 0 评论 -
5.1 TRAINING AND TESTING
people often talk about three datasets: (三中数据集) The training data:the training data is used by one or morelearning schemes to come up with classifiers. (训练集:使用训练器来构造分类器) The validation d翻译 2013-11-23 11:48:22 · 639 阅读 · 0 评论 -
5.6 PREDICTING PROBABILITIES
Two popular criterions used to evaluate probabilistic prediction: (1)Quadratic[kwɒ'drætɪk] Loss Functionquadratic loss function:∑j(p j− a j)Suppose for a single instance there are k possible翻译 2013-11-25 22:36:10 · 442 阅读 · 0 评论 -
5.5 COMPARING DATA MINING SCHEMES
We often need to compare two different learning schemes on the same problem to see which is the better one to use. It seems simple: Estimate the error using cross-validation (or any other suitable est翻译 2013-11-25 19:50:25 · 446 阅读 · 0 评论 -
5.7 COUNTING THE COST
1.Kappa statisticKappa statistic 这个指标用于评判分类器的分类结果与随机分类的差异度。(The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a dataset, while correcting for a转载 2013-11-27 19:03:54 · 495 阅读 · 0 评论