DataMining2013-CSDN博客

转载一些网址

http://www.itisbi.net/?cat=3

2014-01-14 11:07:35 381

转载 5.7 COUNTING THE COST

1.Kappa statistic Kappa statistic 这个指标用于评判分类器的分类结果与随机分类的差异度。（The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a dataset, while correcting for a

2013-11-27 19:03:54 490

翻译 5.6 PREDICTING PROBABILITIES

Two popular criterions used to evaluate probabilistic prediction： (1)Quadratic[kwɒ'drætɪk] Loss Functionquadratic loss function:∑j(p j− a j) Suppose for a single instance there are k possible

2013-11-25 22:36:10 437

翻译 5.5 COMPARING DATA MINING SCHEMES

We often need to compare two different learning schemes on the same problem to see which is the better one to use. It seems simple: Estimate the error using cross-validation (or any other suitable est

2013-11-25 19:50:25 441

翻译 5.4 OTHER ESTIMATES

(1)Leave-One-Out Cross-Validation Each instance in turn is left out, and the learning scheme is trained on all the remaining instances. It is judged by its correctness on the remaining

2013-11-23 16:10:55 545

原创 5.2 PREDICTING PERFORMANCE

interval estimate（区间估计） (1)置信度 Confidence level(置信水平)=confidence coefficient(置信系数) (2)置信区间Confidence interval

2013-11-23 11:50:25 486

翻译 5.1 TRAINING AND TESTING

people often talk about three datasets: (三中数据集) The training data： the training data is used by one or more learning schemes to come up with classifiers. (训练集：使用训练器来构造分类器) The validation d

2013-11-23 11:48:22 629

原创 5.3 CROSS-VALIDATION

Question：The sample(样本) used for training (or testing) might not be representative（代表性）. If, by bad luck, all examples with a certain class were omitted from the training set, you could hardly ex

2013-11-23 11:26:42 623

何须perfection 只需satisfaction