此次作业要求实现的内容一个是逻辑回归,另一个是正则逻辑回归。
Logistic Regression
此次逻辑回归模型是通过学生成绩预测学生是否会被大学录取。
训练数据给出了学生两次考试的成绩和录取情况,这里用1表示录取,0表示未被录取。
Visualizing the data
这里要对plotData.m进行编写来实现可视化,ppt已将答案给出。
plotData.m:
function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
% PLOTDATA(x,y) plots the data points with + for the positive examples
% and o for the negative examples. X is assumed to be a Mx2 matrix.
% Create New Figure
figure; hold on;
% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
% 2D plot, using the option 'k+' for the positive
% examples and 'ko' for the negative examples.
%
pos = find(y==1); neg= find(y==0);
plot (X(pos,1),X(pos,2),'k+','LineWidth',2,...
'MarkerSize',7);
plot (X(neg,1),X(neg,2),'ko','MarkerFaceColor','y',...
'MarkerSize',7);
% =========================================================================
hold off;
end
sigmoid function
因为这是一个分类问题,所以要对通过多项式求解的预测值进行约束,将其约束在0~1之间,所以需要编写约束函数。
这里约束函数用的是
y
=
1
1
+
e
−
z
y = \frac{1}{1+e^{-z}}
y=1+e−z1
这是
y
=
1
1
+
e
−
z
y = \frac{1}{1+e^{-z}}
y=1+e−z1 的图像:
可以看出当x=0时,y=0.5,我们可以通过0.5这个边界来进行判定,如果大于等于0.5,则认为是1,如果小于0.5,则认为是0。
回到题目中来,我们可以设定
g
θ
(
x
)
=
θ
t
X
g_\theta(x) = \theta^tX
gθ(x)=θtX
然后将
g
θ
(
x
)
g_\theta(x)
gθ(x)带入sigmoid函数中,求得
h
θ
(
x
)
h_\theta(x)
hθ(x)。
sigmoid function
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
% g = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
g = exp(-z);
g = 1./(1+g);
% =============================================================
end
Cost function and gradient
这一部分需要我们通过costFunction.m实现求代价函数和梯度,求代价函数
J
(
θ
)
J(\theta)
J(θ)的公式如下:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
∗
l
o
g
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
∗
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
]
J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}*log(h_\theta(x^{(i)}))+(1-y^{(i)})*log(1-h_\theta(x^{(i)}))]
J(θ)=−m1i=1∑m[y(i)∗log(hθ(x(i)))+(1−y(i))∗log(1−hθ(x(i)))]
求梯度的公式如下:
∂
J
(
θ
)
∂
θ
j
=
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
∗
x
j
(
i
)
\frac{\partial J(\theta)}{\partial \theta_j} = \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})*x_j^{(i)}
∂θj∂J(θ)=i=1∑m(hθ(x(i))−y(i))∗xj(i)
costFunction.m
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost
% w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
g = X*theta;
h = sigmoid(g);
J = -1/m*(y'*log(h)+(1-y')*log(1-h));
grad = 1/m*(X'*(h-y));
% =============================================================
end
Learning parameters using fminunc
fminunc表示Octave里无约束最小化函数,调用这个函数时,需要传入一个存有配置信息的变量options。上面的代码中,我们的设置项中’GradObj’, ‘on’,代表设置梯度目标参数为打开状态(on),这也意味着你现在确实要给这个算法提供一个梯度。’MaxIter’, ‘100’代表设置最大迭代次数为100次。initialTheta代表我们给出的一个θ的猜测初始值。
然后我们调用fminunc这个函数,传入三个参数,其中第一个参数@costFunction这里的@符号代表指向之前我们定义的costFunction函数的指针。后面两个参数分别是我们定义的thetatheta初始值和配置信息options。
当我们调用这个fminunc函数时,它会自动的从众多高级优化算法中挑选一个来使用(你也可以把它当做一个可以自动选择合适的学习速率aa的梯度下降算法)。
最终我们会得到三个返回值,分别是满足最小化代价函数J(θ)的θ值optTheta,costFunction中定义的jVal的值functionVal,以及标记是否已经收敛的状态值exitFlag,如果已收敛,标记为1,否则为0。
转自:afunyusong的博客
% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
options = optimset('GradObj', 'on', 'MaxIter', 400);
[theta, cost] = ...
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
Evaluating logistic regression
通过 plotDecisionBoundary.m 绘制的图像如下:
之后就是需要编写predict.m函数,来实现对训练数据运用逻辑回归后的预测结果与真实结果进行比较,算出准确率有多高。
通过将数据集带入
h
θ
(
x
)
h_\theta(x)
hθ(x)中求出结果,如果求出的值大于等于0.5,则预测为1,如果小于0.5,则预测为0。
predict.m
function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
% p = PREDICT(theta, X) computes the predictions for X using a
% threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
% You need to return the following variables correctly
p = zeros(m, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters.
% You should set p to a vector of 0's and 1's
%
g = X*theta;
h = sigmoid(g);
p = round(h);
% =========================================================================
end
以上就是Logistic Regression部分。
Regularized logistic regression
正则化主要是解决因模型复杂度太高而造成的过拟合现象,正则化的代价函数如下:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
∗
l
o
g
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
∗
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
J(_\theta)= -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}*log(h_\theta(x^{(i)}))+(1-y^{(i)})*log(1-h_\theta(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta^2_j
J(θ)=−m1i=1∑m[y(i)∗log(hθ(x(i)))+(1−y(i))∗log(1−hθ(x(i)))]+2mλj=1∑nθj2
注意: 这里的 θ \theta θ不包括 θ 0 \theta_0 θ0。即不能对theta(1)进行正则化
正则化后的梯度如下:
j
=
∂
J
(
θ
)
∂
θ
j
=
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
∗
x
j
(
i
)
j
=
0
j=\frac{\partial J(\theta)}{\partial \theta_j} = \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})*x_j^{(i)} \qquad j=0
j=∂θj∂J(θ)=i=1∑m(hθ(x(i))−y(i))∗xj(i)j=0
j
=
∂
J
(
θ
)
∂
θ
j
=
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
∗
x
j
(
i
)
+
λ
m
∑
j
=
1
n
θ
j
j
>
0
j=\frac{\partial J(\theta)}{\partial \theta_j} = \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})*x_j^{(i)}+\frac{\lambda}{m}\sum_{j=1}^{n}\theta_j \qquad j>0
j=∂θj∂J(θ)=i=1∑m(hθ(x(i))−y(i))∗xj(i)+mλj=1∑nθjj>0
知道这些后就可以开始做题了
Visualizing the data
还是跟上面的题意类似,只是函数图像发生了改变
所以这里不能通过建立直线的模型来进行拟合,所以需要多建立几种特征,Octave中 mapFeature.m可以建立最高项次数为6的特征。
Cost function and gradient
有了上述的公式我们就可以完成costFunctionReg.m函数了。
costFunctionReg.m
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
g = X*theta;
h = sigmoid(g);
J = (-1/m)*(y'*log(h)+(1-y)'*(log(1-h)))+((lambda/(2*m))*(theta'*theta-theta(1)*theta(1)));
grad = 1/m*(X'*(h-y))+(lambda/m*theta);
grad(1) = grad(1) - lambda/m*(theta(1));
% =============================================================
end
带入fminunc即可求出无约束最小化函数。
Plotting the decision boundary
利用训练出来的模型来画出图像
以上就是正则逻辑回归的过程