5.7 COUNTING THE COST

最新推荐文章于 2023-05-05 09:57:32 发布

DataMining2013

最新推荐文章于 2023-05-05 09:57:32 发布

阅读量490

点赞数

分类专栏： DataMining 文章标签： machine learning

DataMining 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1.Kappa statistic

Kappa statistic 这个指标用于评判分类器的分类结果与随机分类的差异度。（The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a dataset, while correcting for an agreement that occurs by chance.）

摘自:http://wenku.baidu.com/view/f1061c165f0e7cd18425361d.html

http://blog.sina.com.cn/emilysasworld

2.Cost-Sensitive Classification && Cost-Sensitive Learning（检验代价及分类代价）

Cost-Sensitive Classification：Given a cost matrix, you can calculate the cost of a particular learned model on a given test set just by summing the relevant elements of the cost matrix for the
model’s prediction for each test instance.(跟定一个cost矩阵，并且已获得一个分类器。可以使用该分类器预测检验集的类别，可以使用该cost矩阵获得检验集中每一个实例的损失。只在检验实例的过程中考虑cost矩阵，在训练分类器的过程中没有考虑到cost矩阵)//http://www.docin.com/p-379037057.html一篇不错的论文

例如：基于最小风险的贝叶斯决策

Cost-Sensitive Learning:Take the cost matrix into account during the training process and ignore costs at prediction time.(在检验实例的过程中没有考虑cost矩阵，只在训练分类器的过程中考虑到cost矩阵)

Varying the proportion of instances in the training set is a general technique for building cost-sensitive classifiers。Suppose you artificially increase the number of no instances by a factor of 10 and use the resulting dataset for tr-aining. If the learning scheme is striving to minimize the number of errors, it will come up with a decision structure that is biased toward avoiding errors on the no instances because such errors are effectively penalized tenfold. If data with the original proportion of no instances is used for testing, fewer errors will be made on these than on yes
instances-that is, there will be fewer false positives than false negatives-

because false positives have beenweighted 10 times more heavily than false negatives.(改变训练集中不中不同类别的实例的比例是建立cost-sensitive的一种方法)

3.Lift chart && ROC curve &&Recall-precision curve

(1)TP Rate =100 × TP/(TP + FN) (2)FP Rate =100 × FP/(FP + TN)

(3)Recall = number of documents retrieved that are relevant/total number of documents that are relevant

(4)Precision =number of documents retrieved that are relevant/total number of documents that are retrieved

DataMining2013

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
5.7 COUNTING THE COST

1.Kappa statisticKappa statistic 这个指标用于评判分类器的分类结果与随机分类的差异度。（The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a dataset, while correcting for a
复制链接

扫一扫

专栏目录