Machine Learning Week 3

Linear Regression can’t be used for Classification Problems

Linear Regression isn’t Working Well for Regression Problem

An extra unusual point may affect all the linear function thus cause some error in the classification process.

Logistic Regression Model

Sigmoid Function or Logistic Function

hθ(x)=g(θTx)

g(Z)=11+eZ

Decision Boundary

The predict is that if hθ(x)0.5 ,then y=1 ,and if hθ(x)<0.5 ,then y=0 which is also equivalent to that if θTx0 ,then y=1 and if θTx<0 ,then y=0

Example:

Cost Function and Gradient Descent for Logistic Regression

Cost Function for Logistic Regression

Together will be:

J(θ)=1mi=1mCost(hθ(x(i)),y(i))

for linear regression:

Cost(hθ(x(i)),y(i))=12(hθ(x(i))y(i))2

for logistic regression:

Cost(hθ(x),y)=ylog(hθ(x)(1y)log(1hθ(x))

Gradient descent of logistic regression is the same with linear regression

θj:=θjαi=1m(hθ(x(i))y(i))x(i)j

Several different way of optimization algorithm

Multiclass Classification

One Vs All

Problem of Overfitting

Under fitting - Just right - Over fitting

How to solve overfitting - Regularisation

Intuition of Regularisation - to make θ small

Shrink all the parameters - starts from θ1 not from θ0 :

J(θ)=12m[i=1m(hθ(x(i))y(i))2+λi=1nθ2j]

Regularisation Parameters

But the new parameter λ used for regularisation can’t be too big otherwise will cause the all the θ too small (almost equals to 0) - underfitting

Cost Function and Gradient Descent With Regularization

Cost Function:

J(θ)=12m[i=1m(hθ(x(i))y(i))2+λi=1nθ2j]

Gradient Descent:

θ0:=θ0α1mi=1m(hθ(x(i))y(i))x(i)0

θj:=θjα[1mi=1m(hθ(x(i))=y(i))x(i)j+λmθj(j=1,23...,n)

Normal Equation

The θ corresponding to global minimum:

θ=(XTX+λ[000010001.........])1XTy

This will also make the original non-invertible matrix invertible

Regularized Logistic Regression

Cost Function:

J(θ)=[1mi=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]+λ2mj=1nθ2j

Repeating the Gradient Descent:

θ0:=θ0α1mi=1m(hθ(x(i))y(i))x(i)o

θj:=θjα1mi=1m(hθ(x(i))y(i))x(i)j+λmθj

(j=1,2,3...),hθ(x)=11+eθTX

Matlab syntax:

fminunc(costFunction)

function[jVal,gradient]=costFunction(X,y, θ )

Weekly Matlab Exercise


%sigmoid:

function g = sigmoid(z)
g = zeros(size(z));

g=1./(1+exp(-z))

end

%plotData:

function plotData(X, y)
figure; hold on;

pos=find(y==1);neg=find(y==0);
plot(X(pos,1),X(pos,2),'k+','LineWidth',2,'MarkerSize',7);
plot(X(neg,1),X(neg,2),'ko','MarkerFaceColor','y','MarkerSize',7);

hold off;
end

%costFunction(without regularisation):

function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));

J=1/m*(-log(sigmoid(X*theta))'*y-log(1-sigmoid(X*theta))'*(1-y));
grad=1/m*((sigmoid(X*theta)-y)'*X)';

end

%predict:

function p = predict(theta, X)
m = size(X, 1); % Number of training examples
p = zeros(m, 1);

p=round(1./(1+exp(-X*theta)))

end

%costFunction(with regularisation):

function [J, grad] = costFunctionReg(theta, X, y, lambda)
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));

theta2=theta(2:size(theta,1),1)
J=1/m*(-y'*log(sigmoid(X*theta))-(1.-y)'*log(1-sigmoid(X*theta)))+lambda./(2*m)*(theta2'*theta2)
grad1=(1/m*(sigmoid(X*theta)-y)'*X)'
theta1=lambda/m*theta
theta1(1,1)=0
grad=grad1+theta1

end

%use the fminunc function:

initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;
% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Optimize
[theta, J, exit_flag] = ...
    fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值