Machine Learning Week Three

Classification

Logistic Function(also called sigmoid function) is in the form

g(z)=11+ez

which map real number to (0,1) interval.
new form of hypotheses hθ(x) is
hθ(x)=g(θTx)

and
z=θTx

now h(x) gives probability that our output is 1.
hθ(x)=P(y=1|x;θ)=1P(y=0|x;θ)

hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0

for example,
z=θ0+θ1x21+θ2x22
where

θ⃗ =θ0θ1θ2

recap that

J(θ)=1mi=1mCost(hθ(x(i)),y(i))

now in classification problems
we ues cost function
Cost(hθ(x),y)=log(hθ(x)) if y = 1
Cost(hθ(x),y)=log(1hθ(x)) if y = 0

we can compress cost function into one case

Cost(hθ(x),y)=ylog(hθ(x))(1y)log(1hθ(x))

our fully cost function is as follows:

J(θ)=1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]

a vector implementation is
h=g(XTθ)

J(θ)=1m(yTlog(h)1yTlog(1h))

the gradient descending algorithm is:

θj:=θjαmi=1m(hθ(x(i))y(i))x(i)j

which is coincidently identical to the form of linear regression
and the vectorized implementation is:
θ:=θαmXTg((Xθ)y)

Advanced Optimization

use matlab library to implement advanced algorithm

function [jVal, Gradient] = costFunction(theta)
    jVal = % write code here to compute J(theta)
    gradient = % here to compute derivative J(theta)
end

then we use fminunc()

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2, 1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction,
    initialTheta, options);

Multiclass Classification: One-vs-all

we divide our problem into n+1 binary classification problems; in each one, we predict the probability that ‘y’ is a member of one of our classes.

y{1,2,,n}
h0θ(x)=P(y=0|x;θ)

hnθ(x)=P(y=n|x;θ)
prediction=maxi(h(i)θ(x))

over fitting

Underfitting is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that uses too few features.

Overfitting is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

Regularization

minθJ(x)=12m[i=1m(h(i)θ(x)y(i))2+λj=1nθ2j]

minθJ(x)=1m[i=1my(i)log(hθ(x(i))+(1y(i))log(1hθ(x(i)))]+λ2mj=1nθ2j

gradient descent

θj:=θjα[1mi=1m(hθ(x(i))y(i))x(i)j+λmθj]

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值