Logistic Regression - Decision boundary

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第七章《logistic回归》中第48课时《决策边界》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In the last video, we talked about the hypothesis representation for logistic regression.

What I'd like to do now is tell you about something called the decision boundary, and this will give us a better sense of what the logistic regression hypothesis function is computing. To recap, this is what we wrote out last time, where we said that the hypothesis is represented as h_{\theta }(x)=g(\theta ^{T}x), where g is this function called the sigmoid function which looks like this g(z)=\frac{1}{1+e^{-z}}. So it slowly increases from 0 to 1, asymptoting at 1. What I want to do now is try to understand better when this hypothesis will make predictions that y is equal to 1 versus when it might make predictions that y is equal to 0 and understand better what the hypothesis function looks like particularly when we have more than one feature. Concretely, this hypothesis is outputting estimates of the probability that y is equal to 1 given x and parameterized by \theta. So if we wanted to predict is y equal to 1 or is y equal to 0, here is something we might do. Whenever the hypothesis outputs that the probability with y being 1 is greater than or equal to 0.5 so this means that it is more likely to be y equals 1 than y equals 0 then let's predict y equals 1. And otherwise, if the probability of, the estimated probability of being 1 is less than 0.5, then let's predict y equals 0. And I chose a greater than or equal to 0.5 or less than 0.5. If h_{\theta }(x) is equal to 0.5 exactly, then we could predict positive or negative, but I put a greater than or equal to here so we default maybe to predict a positive if h_{\theta }(x) is 0.5. But that's a detail that really doesn't matter that much. What I want to do is understand better when it is exactly that h_{\theta }(x) will be greater or equal to 0.5, so that we end up predicting y is equal to 1. If we look at this plot of the sigmoid function, we'll notice that the sigmoid function, g(z), is greater than or equal to 0.5 whenever z is greater than or equal to 0. So is in this half of the figure that, g takes on values that are 0.5 and higher. This is node here, that's the 0.5. So when z is positive, g(z) the sigmoid function, is greater than or equal to 0.5. Since the hypothesis for logistic regression is h_{\theta }(x)=g(\theta ^{T}x). This is therefore going to be greater than or equal to 0.5 whenever \theta ^{T}x is greater than or equal to 0. So what was shown, right, because here \theta ^{T}x takes the role of z. So what we're shown is that our hypothesis is going to predict y equals 1 whenever \theta ^{T}x is greater than or equal to 0. Let's now consider the other case of when a hypothesis will predict y is equal to 0. Well, by similar argument, h_{\theta }(x) is going to be less than 0.5 whenever g(z) is less than 0.5, because the range of values of z that calls g(z) to take on values less than 0.5, well that's when z is negative. So when g(z) is less than 0.5, our hypothesis will predict that y is equal to 0, and by similar argument to what we had earlier, h_{\theta }(x)=g(\theta ^{T}x). And so, we'll predict y equals 0 whenever this quantity \theta ^{T}x is less than 0. To summarize what we just worked out, we saw that if we decide to predict whether y is equal to 1 or y is equal to 0, depending on whether the estimated probability is greater than or equal to 0.5, or whether it's less than 0.5, that's the same as saying that will predict y equals 1 whenever \theta ^{T}x is greater than or equal to 0, and we'll predict y is equal to 0 whenever \theta ^{T}x is less than 0.

Let's use this to better understand how the hypothesis of logistic regression makes those predictions. Now, let's suppose we have a training set like that shown on the slide, and suppose our hypothesis is h_{\theta }(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}). We haven't talked yet about how to fit the parameters of this model. We'll talk about that in the next video. But suppose that very procedure to be specified, we end up choosing the following values for the parameters. Let's say we choose \theta _{0}=-3, \theta _{1}=1, \theta _{2}=1. So this means my parameter vector is going to be \theta =\begin{bmatrix} -3\\ 1\\ 1 \end{bmatrix}. So, we're given this choice of my hypothesis parameters, let's try to figure out where a hypothesis will end up predicting y=1 and where it will end up predicting y equals 0. Using the formulas that we worked on the previous slides, we know that y=1 is more likely, that is the probability that y=1 is greater than or equals to 0.5. Whenever \theta ^{T}x is greater than 0. And this formula that I just underlined, -3+x_{1}+x_{2} is, of course, \theta ^{T}x, when \theta is equal to this value of the parameters that we just chose. So, for any example, for any example with features x1 and x2, that satisfy this equation that -3+x_{1}+x_{2} is greater than or equal to 0, our hypothesis will think that y equals 1 is more likely, or will predict that y is equal to 1. We can also take -3 and bring this to the right and rewrite this as x_{1}+x_{2}\geqslant 3. And so, equivalently, we found that this hypothesis will predict y=1 whenever x1+x2 is greater than or equal to 3. Let's see what that means on the figure. If I write down the equation x_{1}+x_{2}=3, this defines the equation of a straight line. And if I draw what that straight line looks like, it gives me the following line which passes through 3 and 3 on the x1 and x2 axis. So the part of the input space, the part of the x1 and x2 plane that corresponds to when x1+x2 is greater than or equal to 3. That is going to be this right half plane. That is everything to the upper right portion of this magenta line that I just drew. And so, the region where our hypothesis will predict y=1 is really this huge region this half space over to the upper right. And let me just write that down. I'm gonna call this y=1 region. And in contrast, the region there x1+x2 is less than 3 that's when we'll predict that y=0, and that corresponds to this region. You know, it's really a half plane, but that region on the left is the region where our hypothesis is predict y=0. I want to give this line, this magenta line that I drew a name. This line there is called the decision boundary. And concretely, this straight line x1+x2=3. That corresponds to the set of points, that corresponds to the region where h_{\theta }(x) is equal to 0.5 exactly. And the decision boundary, that is this straight line, that's the line that separates the region where the hypothesis predicts y=1 from the region where the hypothesis predicts that y=0. And just to be clear, the decision boundary is a property of the hypothesis including the parameters \theta _{0}, \theta _{1} and \theta _{2}. And in the figure I drew a training set. I drew a data set in order to help the visualization. But even if we take away the data set, this decision boundary and a region where we predict y=1 versus y=0. That's a property of the hypothesis and of the parameters of the hypothesis, and not a property of the data set. Later on, of course, we'll talk about how to fit the parameters and there we'll end up using the training set, or using our data, to determine the value of the parameters. But once we have particular values for the parameters \theta _{0}, \theta _{1} and \theta _{2}, then that completely defines the decision boundary and we don't actually need to plot a training set in order to plot the decision boundary.

Let's now look at a more complex example where, as usual, I have crosses to denote my positive examples and o's to denote my negative examples. Given a training set like this, how can I get logistic regression to fit this sort of data? Earlier, when we were talking about polynomial regression or when we're talking about linear regression, we talked about how we can add extra higher order polynomial terms to the features. And we can do the same for logistic regression. Concretely, let's say my hypothesis looks like this. Where I've added two extra features, x_{1}^{2} and x_{2}^{2} to my features. So that I now have 5 parameters, \theta _{0} through \theta _{4}. As before, we'll defer to the next video our discussion on how to automatically choose values for the parameters \theta _{0} through \theta _{4}. But let's say that very procedure to be specified, I end up choosing \theta _{0}=1, \theta _{1}=0, \theta _{2}=0, \theta _{3}=1, and \theta _{4}=1. What this means is that with this particular choice of parameters, my parameter vector \theta =\begin{bmatrix} -1\\ 0\\ 0\\ 1\\ 1 \end{bmatrix}. Following our earlier discussion, this means that my hypothesis will predict that y=1 whenever -1+x_{1}^{2}+x_{2}^{2}\geqslant 0. This is whenever \theta ^{T}x\geqslant 0. And if I take -1 and just bring this to the right, I'm saying that my hypothesis will predict that y=1 whenever x_{1}^{2}+x_{2}^{2}\geqslant 1. So, what does decision boundary look like? Well, if you were to plot the curve for x_{1}^{2}+x_{2}^{2}= 1. Some of you will recognize that's the equation for a circle of radius 1 centered around the origin. So, that is my decision boundary. And everything outside the circle I'm going to predict as y=1. So out here is my y=1 region. And inside the circle is where I'll predict y=0. So, by adding this more complex polynomial terms to my features as well, I can get more complex decision boundaries that don't just try to separate the positive and negative examples with straight line. I can get in this example a decision boundary that is a circle. Once again, the decision boundary is a property not of the training set, but of the hypothesis and of the parameters. So long as we've given my parameter vector \theta, that defines the decision boundary which is the circle. But the training set is not what we use to define the decision boundary. The training set may be used to fit the parameters \theta. We'll talk about how to do that later. But once you have the parameters \theta, that is what defines the decision boundary. Let me put the back the training set just for visualization.

And finally, let's look at a more complex example. So can we come up with even more complex decision boundary than this? If I have even higher order polynomial terms, so things like h_{\theta }(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{1}^{2}+\theta _{4}x_{1}^{2}x_{2}+\theta _{5}x_{1}^{2}x_{2}^{2}+\theta _{6}x_{1}^{3}x_{2}+...). If I have much higher order polynomials then it's possible to show that you can get even more complex decision boundaries and logistic regression can be used to find decision boundaries that may, for example, be an ellipse like that, or with a different setting of parameters, maybe you can get a different decision boundary which may even look like, some funny shape like that.  Or even for more complex examples you can also get decision boundaries that could look like more complex shape like that. Where everything in here you predict y=1, and everything outside you predict y=0. So these higher order polynomial features you can get very complex decision boundaries.  So with these visualizations, I hope that gives you a sense what's the range of hypothesis functions you can represent using the representation that we have for logistic regression.

Now that we know what h_{\theta }(x) can represent. What I'd like to do next in the following video is talk about how to automatically choose the parameters \theta. So that given a training set we can automatically fit the parameters to our data.

<end>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值