【Machine Learning】【Andrew Ng】- notes(Week 3: Logistic Regression Model)

Cost Function

We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.
Instead, our cost function for logistic regression looks like:
这里写图片描述
When y = 1, we get the following plot for J(θ) J ( θ ) vs hθ(x) h θ ( x ) :
这里写图片描述
Similarly, when y = 0, we get the following plot for J(θ) J ( θ ) vs hθ(x) h θ ( x ) :
这里写图片描述
这里写图片描述
If our correct answer ‘y’ is 0, then the cost function will be 0 if our hypothesis function also outputs 0. If our hypothesis approaches 1, then the cost function will approach infinity.
If our correct answer ‘y’ is 1, then the cost function will be 0 if our hypothesis function outputs 1. If our hypothesis approaches 0, then the cost function will approach infinity.
Note that writing the cost function in this way guarantees that J(θ) is convex for logistic regression.

Simplified Cost Function and Gradient Descent

We can compress our cost function’s two conditional cases into one case:
Cost(hθ(x),y)=ylog(hθ(x))(1y)log(1hθ(x)) C o s t ( h θ ( x ) , y ) = − y log ⁡ ( h θ ( x ) ) − ( 1 − y ) log ⁡ ( 1 − h θ ( x ) )
Notice that when y is equal to 1, then the second term (1y)log(1hθ(x)) ( 1 − y ) log ⁡ ( 1 − h θ ( x ) ) will be zero and will not affect the result. If y is equal to 0, then the first term ylog(hθ(x)) − y log ⁡ ( h θ ( x ) ) will be zero and will not affect the result.
We can fully write out our entire cost function as follows:
这里写图片描述
A vectorized implementation is:
这里写图片描述
Gradient Descent
Remember that the general form of gradient descent is:
这里写图片描述
We can work out the derivative part using calculus to get:
这里写图片描述
Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.
A vectorized implementation is:
θ:=θαmXT(g(Xθ)y⃗ ) θ := θ − α m X T ( g ( X θ ) − y → )

Advanced Optimization

“Conjugate gradient”, “BFGS”, and “L-BFGS” are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they’re already tested and highly optimized. Octave provides them.
We first need to provide a function that evaluates the following two functions for a given input value θ:
这里写图片描述
We can write a single function that returns both of these:

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we can use octave’s “fminunc()” optimization algorithm along with the “optimset()” function that creates an object containing the options we want to send to “fminunc()”. (Note: the value for MaxIter should be an integer, not a character string - errata in the video at 7:30)

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

We give to the function “fminunc()” our cost function, our initial vector of theta values, and the “options” object that we created beforehand.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值