Machine learning system design - Error metrics for skewed classes

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十二章《机器学习系统设计》中第95课时《不对称分类的误差度量》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————

In the previous video, I talked about error analysis and the importance to have error metrics that is having a single real number evaluation metric for your learning algorithm to tell how well it's doing. In the context of evaluation and error metrics, there is one important case where it's particularly tricky to come up with an appropriate error metric, or evaluation metric, for your learning algorithm. That case is the case of what's called the skewed classes. Let me tell you what that means.

Consider the problem of cancer classification, where we have features of medical patients and we want to decide whether or not they have cancer. So, this is like the malignant versus benign tumor classification example we had earlier. So let's say y=1 if a patient has cancer and y=0 if they do not. We have trained the logistic regression classifier and let's say we test our classifier on a test set and find that we get 1\% error. So, we're making 99\% correct diagnosis. Seems a really impressive result. But now, let's say we find out that only 0.5\% of our patients in our training test set actually have cancer. In this case, the 1\% error no longer looks so impressive. And in particular, here's a piece of non learning code that takes this input of features x and it ignores it. It just sets y=0 and always predicts nobody has cancer and this algorithm would actually get 0.5\% error. So this is even better than the 1\% error we were getting just now. So this setting of when the ratio of positive to negative examples is very close to one of two extremes, where, in this case, the number of the number of positive examples is much smaller than the number of negative examples because y=1 so rarely, this is what we call the case of skewed classes. We just have a lot more examples from one class than from the other class. And by just predicting y=0 all the time or maybe predicting y=1 all the time, an algorithm can do very well. So the problem with using classification error or classification accuracy as our evaluation metric is the following. Let's say you have one learning algorithm that's getting 99.2\% accuracy. So that a 0.8\% error. Let's say you make a change to your algorithm and you now are getting 99.5\% accuracy. That is 0.5\% error. So, is this an improvement to the algorithm or not? One of the nice things about having a single real number evaluation metric is this helps us to quickly decide if we just need a good change or not to the algorithm. By going from 99.2\% accuracy to 99.5\% accuracy. Did we just do something useful or did we just replace our code with something that jst predicts y=0 more often? So if you have very skewed classes, it becomes much harder to use just classification accuracy, because you can get very high classification accuracies or very low errors, and it's not always clear if doing so is really improving the quality of your classifier because predicting y=0 all the time doesn't seem like a particularly good classifier. But just predicting y=0 more often can bring your error down to as low as 0.5%. When we're faced with such skewed classes, we would want to come up with a different error metric.

One of such evaluation metrics is what's called precision/recall (查准率/召回率). Let me explain what that is. Let's say you're evaluating a classifier on the test set. For example in the test set, the actual class of that example in the test set is going to be either 1 or 0. If there is a binary classification problem. And what our learning algorithm will do is it will predict some value for the class and our learning algorithm will predict the value for each example in my test set and the predicted value will also be either 1 or 0. So let me draw a 2x2 table as follows, depending on a full of these entries, what was the actual class and what was the predicted class. If we have an example where the actual class is 1 and the predicted class is one, then that's called an example that's a true positive, meaning our algorithm predicted that's positive and in reality the example is positive. If our learning algorithm predicted that something is negative, class 0, and the actual class is also 0, then that's what's called true negative. To find the other two boxes, if our learning algorithm predicts that the class is 1, but the actual class is 0, then that's called a false positive.  So that means our algorithm predicts the patient is cancered but in reality if the patient does not. Finally, the last box is a 0/1. That's called a false negative. And so we have this little sort of 2x2 table based on what was the actual class and what was the predicted class. So, here's a different way of evaluating the performance of our algorithm. We're going to compute two numbers. The first is called precision. And what that says is: of all the patients where we've predicted that they have cancer, what fraction of them actually have cancer? The precision of a classifier is the number of true positives divided by the number that we predicted as positive (\frac{True.positives}{number.predicted.positive}). And another way to write this would be \frac{True.POS}{True.POS + False.POS}. High precision would be good. That means that all the patients that we went to and we said, "You know, we are very sorry. We think you have cancer". High precison means that of that group of patients, most of them we have actually made accurate predictions on them and they do have cancer. The second number we're going to compute is called recall, and what recall says is, of all the patients, let's say in the test set or the cross-validation set, that actually have cancer, what fraction of them that we correctly detect as having cancer. So if all the patients have cancer, how many of them did we actually go to them and correctly told them that we think they need treatment. So recall is defined as the number of true positives divided by the number of actual positives (\frac{True.positives}{number.actual.positives}). So this is the right number of actual positives of all the people that do have cancer. What fraction do we correctly flag and send the treatment. To rewirte this in a different form, \frac{True.positive}{True.positives+False.negatives}. Having a high recall would be a good thing. So by computing precision and recall, this will usually give us a better sense of how well our classifier is doing. And in particular, if we have a learning algorithm that predicts y=0 all the time, if it predicts no one has cancer, then this classifier will have a recall equal to 0 because there won't any true positive and so that's a quick way to recognize that a classifier that predicts y equals 0 all the time, just isn't a very good classifier. And more generally, even for settings where we have very skewed classes, it's not possible for an algorithm to sort of "cheat" and somehow get a very high precision and a very high recall by doing some simple thing like predicting y=0 all the time or y=1 all the time. So we're much sure that a classifier of a high precision or high recall actually is a good classifier, and this gives us a more useful evaluation metric that is a more direct way to actually understood whether our algorithm may be doing well. So one final note in the definition of precision and recall. Usually we use the convention that y=1 in the presense of more rare class. So if we are trying to detect some rare contions such as cancer, precision and recall are defined setting y=1, rather than y=0, to be sort of that the presence of that rare class that we're trying to detect.

And by using precision and recall, what happens is that even if we have very skewed class, it's not possible for an algorithm to "cheat" and predict y=1 all the time, or y=0 all the time and get high precision and recall. And in particular, if a classifier is getting high precision and high recall, then we are actually confident that the algorithm has been doing well, even if we have a very skewed class. So for the problem of skewed classes, precision and recall give us more direct insight into how the learning algorithm is doing, and this is often a much better way to evaluate our learning algorithms than looking at classification error or classification accuracy when the classes are very skewed. 

<end>

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值