Advice for applying machine learning - Evaluating a hypothesis

王彩旗 edwardwangcq.com

于 2020-07-02 21:59:36 发布

阅读量150

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/106988375

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第十一章《应用机器学习的建议》中第84课时《评估假设》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助.
————————————————
In this video, I would like to talk about how to evaluate a hypothesis that has been learned by your algorithm. In later videos, we will build on this to talk about how to prevent the problems of overfitting and underfitting as well.

When we fit the parameters of our learning algorithm, we think about choosing the parameters to minimize the training error. One might think that getting a really low value of training error might be a good thing, but we have already seen that just because a hypothesis has low training error, that doesn't mean it is necessarily a good hypothesis. And we've alredy seen the example of how a hypothesis can overfit. And therefore fail to generalize the new examples not in the training set. So how do you tell if the hypothesis might be overfitting. In this simple example we could plot the hypothesis $h_{\theta }(x)$ and see what was going on. But in general, for problems with more features than just one feature, for problems with a large number of features like these it becomes hard or maybe impossible to plot what the hypothesis looks like. So we need other way to evaluate our hypothesis.

The standard way to evaluate a learned hypothesis is as follows. Suppose we have a data set like this. Here I have just shown 10 training examples. But of course usually we may have dozens or hundreds or maybe thousands of training examples. In order to make sure we can evaluate our hypothesis, what we are going to do is split the data we have into two portions. The first portion is going to be our usual training set and the second portion is going to be our test set. And a pretty typical split of all the data we have into a training set and test set might be around 70% and 30% split. With more today to going to the training set and relatively less to the test set. And so now, if we have some data set, we want assign only say 70% of the data to be our training set where here "m" is as usual our number of training examples and the remainder of our data might then be assigned to become our test set. And here, I'm going to use the notation $m_{test}$ to denote the number of test examples. And so in general, this subscript test is going to denote examples that come from a test set so that $(x^{(1)}_{test}, y^{(1)}_{test})$ is my first test example which I guess in this example might be this example over here (red box). Finally, one last detail. Whereas here I've drawn this as though the first 70% goes to the training set and the last 30% to the test set. If there is any sort of ordering to the data, that should be better to send a random 70% of your data to the training set and a random 30% of your data to the test set. So if your data were already randomly sorted, you could just take the first 70% and last 30%. But if your data were not randomly ordered, it would be better to randomly shuffle or to randomly reorder the examples in your training set. Before you know sending the first 70% in the training set and the last 30% to the test set.

Here then is a fairly typical procedure for how you would train and test the learning algorithm maybe linear regression. First, you learn the parameters $\theta$ from the training set, so you minimize the usual training error $J(\theta )$ . Where $J(\theta )$ here was defined using that 70% of all the data you have. That is only the training data. And then you would compute the test error. I am going to denote the test error as $J_{test}(\theta )$ . And so what you do is take your parameter $\theta$ that you have learned from the training set, and plug it in here and compute your test set error which I'm going to write as follows. So this is basically the average squared error as measured on your test set. It's pretty much what you'd expected. So if we run every test example through your hypothesis with parameter $\theta$ , and just measure the squared error that your hypothesis has on your $m_{test}$ test examples. And of course, this is the definition of the test set error if we are using linear regression and using the squared error metric. How about if we were doing a classification problem and say using logistic regression instead.

In that case, the procedure for training and testing say logistic regression is pretty similar. First we will learn the parameters from the training data, that first 70% of the data. And then we'll compute the test error as follows. It's the same objective function as we always use for logistic regression, except that now it's defined using our $m_{test}$ test examples. While this definition of the test set error $J_{test}(\theta )$ is perfectly reasonable, sometimes there is an alternative test set metric that might be easier to interpret, and that's the misclassification error. It's also called the 0/1 misclassification error, with 0/1 denoting that you either get an example right or you get an example wrong. Here's what I mean. Let me define the error of a prediction, that is $h_{\theta }(x)$ , and given the label y as equal to 1 if my hypothesis outputs the value greater than equal to 0.5 and y equal to 0; or if my hypothesis outputs a value of less than 0.5 and y equal to 1. So both of these cases basic respond to if your hypothesis mislabeled the example assuming your threshold at an 0.5. So either thought it was more likely to be 1, but it was actually 0; or your hypothesis thought it was more likely to be 0 but the label was actually 1. And otherwise, we define this error function to be 0. If your hypothesis basically classified the example y correctly. We could then define the test error, using the misclassification error metric to be $\frac{1}{m_{test}} \sum_{i=1}^{m_{test}}err(h_{\theta }(x^{(i)}_{test}),y^{(i)}_{test})$ . And so that's the definition of the test set error using the misclassification error or the 0/1 misclassification metric.

So, that's the standard technique for evaluating how good a learned hypothesis is. In the next video, we'll adapt these ideas to help us do things like choose what features like the degree polynomial to use with the learning algorithm or choose the regularization parameter for learning algorithm.

<end>