Regularization - Regularized logistic regression

最新推荐文章于 2020-05-15 17:05:32 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2020-05-15 17:05:32 发布

阅读量323

点赞数 1

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/105566023

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第八章《正则化》中第58课时《正则化logistic regression》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助。

For logistic regression, we previously talked about two types of optimization algorithms. We talked about how to use gradient descent to optimize the cost function $J(\theta )$ . And we also talked about advanced optimization methods. Ones that require that you provide a way to compute the cost function $J(\theta )$ , and that you provide a way to compute the derivatives. In this video, we’ll show how you can adapt both of those techniques, both gradient and the more advanced optimization technique, in order to have them work for regularized logistic regression. So, here is the idea.

We saw earlier that logistic regression can also be prone to overfitting. If you fit it with a very sort of high order polynomial features like this, where g is the sigmoid function, and in particular, you may end up with a hypothesis whose decision boundary to be just sort of an overly complex and extremely contortive function that really isn’t such a great hypothesis for this training set. And more generally, if you have logistic regression with a lot of features, not necessarily polynomial ones, that just with a lot of features you can end up with overfitting. This was our cost function for logistic regression. And if we want to modify it to use regularization, all we need to do is add to it the following term, $\frac{\lambda }{2m}\sum_{j=1}^{n}\theta _{j}^{2}$ . And this has the effect therefore of penalizing the parameters $\theta _{1}$ , $\theta _{2}$ , $\theta _{3}$ up to $\theta _{n}$ and so on from being too large. And if you do this, then it will have the effect that even though you’re fitting a very high order polynomial with a lot of parameters, so long as you apply regularization and keep the parameters small, you’re more likely to get a decision boundary, that maybe looks more like this. It looks more reasonable for separating out the positive and negative examples. So, when using regularization, even when you have a lot of features, the regularization can help take care of the overfitting problem. How do we actually implement this?

Well, for the original gradient descent algorithm, this was the update we had. We will repeatedly perform the following update to $\theta _{j}$ . This slide looks a lot like the previous one for linear regression. But what I’m going to do is write the update for $\theta _{0}$ separately. So the first line is for update for $\theta _{0}$ , and the second line is now my update for $\theta _{1}$ up to $\theta _{n}$ , because I’m going to treat $\theta _{0}$ separately. And in order to modify this algorithm to use a regularized cost function, all I need to do is pretty similar to what we did for linear regression, is actually to just modify this second update rule as follows. And once again, this cosmetically looks identical to what we have for linear regression. But of course is not the same algorithm as we had, because now the hypothesis is defined using this. So this is not the same algorithm as regularized linear regression, because hypothesis is different. Even though this update I wrote down is actually looks cosmetically the same as what we had earlier, we’re working out gradient descent for regularized logistic regression. And of course, just to wrap up this discussion. This term here in this square brackets, is of course the new partial derivative for respect of $\theta _{j}$ of the new cost function $J(\theta )$ , where $J(\theta )$ here is the cost function we defined on previous slide that does use regularization. So, that’s gradient descent for regularized logistic regression.

Let’s talk about how to get regularized logistic regression to work using the more advanced optimization methods. And just to remind you, for those methods, what we needed to do was to define the function that’s called costFunction that takes us input the parameter vector $\theta$ . And once again, in the equations we’ve been writing here, we used 0-indexed vectors, so we had $\theta _{0}$ up to $\theta _{n}$ . But because Octave indexes the vectors starting from 1, $\theta _{0}$ is written in Octave as $\theta _{1}$ , $\theta _{1}$ is written in Octave as $\theta _{2}$ , and so on, down to $\theta _{n+1}$ . And what we needed to do was provide a function. Let’s provide a function called costFunction that we would then pass in to what we had, we saw earlier. We use the fminunc(@costFunction, …). But the fminunc was the f min unconstrained, and this will work with, and fminunc was what will take the costFunction and minimize it for us. So the two main things that costFunction needed to return were first jVal. And for that, we need to write code to compute the cost function $J(\theta )$ . Now, when we were using regularized logistic regression, of course the cost function $J(\theta )$ changes. And in particular, now a cost function needs to include this additional regularization term at the end as well. So when you compute $J(\theta )$ , be sure to include that term at the end. And then, the other thing that this cost function thing needs to derive with a gradient. So gradient(1) needs to be set to the partial derivative of $J(\theta )$ with respect to $\theta _{0}$ , gradient(2) needs to be set to that, and so on. Once again, the index is off by one, because of the indexing from 1 that Octave uses. And looking at these terms. This term over here $\frac{\partial }{\partial \theta _{0}}J(\theta )$ , we actually worked this out on previous slide, is actually equal to this. It doesn’t change, because the derivative for $\theta _{0}$ doesn’t change, compared to the version without regularization. And the other terms do change. And in particular, the derivative respect to $\theta _{1}$ , we worked this out on the previous slide as well, it’s equal to the original term plus $\frac{\lambda }{m}\theta _{1}$ . Just so which we pass this correctly, if we can add parenthesis here, so the summation doesn’t extend. And similarly, this other term here looks like this, with this additional term that we had on the previous slide that corresponds to the gradient from their regularization objective. So if you implement this costFunction, and pass this into fminunc or one of those advanced optimization techniques, that will minimize the new regularized cost function $J(\theta )$ . And the parameters you get out will be the ones that corresponds to logistic regression with regularization. So now you know how to implement regularized logistic regression.

When I walk around Silicon Valley, I live here in Silicon Valley, there are a lot of engineers that are frankly making a ton of money for the companies using machine learning algorithms. And I know we’ve only been studying this stuff for a little while, but if you understand linear regression, logistic regression, the advanced optimization algorithms, and regularization, by now, frankly, you probably know quite a lot more machine learning than many certainly now, but you probably know quite a lot more machine learning right now than frankly, many of the Silicon Valley engineers, while they’re having very successful careers, making tons of money for the companies, or building great products using machine learning algorithms. So, congratulations, you’ve actually come a long ways, and you can actually know enough to apply this stuff and get to work for many problems. So congratulations for that. But of course, there’s still a lot more that we want to teach you and in the next set of videos after this, we’ll start to talk about a very powerful class of non-linear classifier. So whereas linear regression, logistic regression you can form polynomial terms but it turns out that there are much more powerful non-liner classifiers that can then solve for polynomial regression. And in the next set of videos after this one, I’ll start telling you about them. So that you have even more powerful learning algorithms than you have now to apply to different problems.

<end>

王彩旗 edwardwangcq.com

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Regularization - Regularized logistic regression

For logistic regression, we previously talked about two types of optimization algorithms. We talked about how to use gradient descent to optimize the cost function J(Θ). And we also talked about advan...
复制链接

扫一扫