注:本文为学习课程的作业,使用软件为matlab,学习网站为:https://www.coursera.org/learn/machine-learning/programming/ixFof/logistic-regression
1. Logistic Regression
在本部分练习中,建立一个逻辑回归模型,以预测学生是否被大学录取。根据每位申请人在两次考试中的成绩来确定他们的入学机会。 您具有以前申请人的历史数据,可以用作逻辑回归的训练集。 对于每个培训示例,您都有两次考试的申请人分数和入学决定。
任务是建立一个分类模型,根据这两次考试的分数估算申请人的录取概率。
加载数据:
% Load Data
% The first two columns contain the exam scores and the third column contains the label.
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
1.1 Visualizing the data
可视化数据,在plotData.m中完成代码:
function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
% PLOTDATA(x,y) plots the data points with + for the positive examples
% and o for the negative examples. X is assumed to be a Mx2 matrix.
% Create New Figure
figure; hold on;
% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
% 2D plot, using the option 'k+' for the positive
% examples and 'ko' for the negative examples.
%
% Find Indices of Positive and Negative Examples
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, 'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 7);
% =========================================================================
hold off;
end
调用函数:
% Plot the data with + indicating (y = 1) examples and o indicating (y = 0) examples.
plotData(X, y);
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
结果如下:
1.2 Implementation
1.2.1 Warmup exercise: sigmoid function(S函数)
逻辑回归假设函数:
其中g为S函数:
在sigmoid.m中实现此功能,后面的代码会调用它。 对于较大的x正值,S形应该接近1,对于较大的负值,S形应该接近0。计算sigmoid(0)应该恰好为0.5。代码也应适用于向量和矩阵。 对于矩阵,函数应在每个元素上执行S型函数。
sigmoid.m:
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
% g = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
%Ones = ones(size(z));
g=1./(1+(exp(1).^-z));
% =============================================================
end
1.2.2 Cost function and gradient(代价函数与梯度递减)
成本函数:
梯度递减:
costFunction.m:
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost
% w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
h = sigmoid(X*theta);
J = 1/m*(-y'*log(h)-(1-y')*log(1-h));
grad = 1/m*(X'*(h-y));
% =============================================================
end
调用:
% Setup the data matrix appropriately
[m, n] = size(X);
% Add intercept term to X
X = [ones(m, 1) X];
% Initialize the fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display the initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
fprintf('Cost at initial theta (zeros): %f\n', cost);
disp('Gradient at initial theta (zeros):'); disp(grad);
1.2.3 Learning parameters using fminunc
使用名为fminunc的MATLAB内置函数,而不是执行梯度下降步骤。
运行代码:
% Set options for fminunc
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 400);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
% Print theta
fprintf('Cost at theta found by fminunc: %f\n', cost);
disp('theta:');disp(theta);
% Plot Boundary
plotDecisionBoundary(theta, X, y);
% Add some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;
结果:
1.2.4 Evaluating logistic regression
使用模型预测是否将录取特定学生,并评估模型的准确率
predict.m:
function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
% p = PREDICT(theta, X) computes the predictions for X using a
% threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
% You need to return the following variables correctly
p = zeros(m, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters.
% You should set p to a vector of 0's and 1's
%
possible=sigmoid(X*theta);
for i=1:m
if(possible(i)>=0.5)
possible(i)=1;
else
possible(i)=0;
end
end
p=possible;
% =========================================================================
end
调用代码:
% Predict probability for a student with score 45 on exam 1 and score 85 on exam 2
prob = sigmoid([1 45 85] * theta);
fprintf('For a student with scores 45 and 85, we predict an admission probability of %f\n\n', prob);
% Compute accuracy on our training set
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
2. Regularized logistic regression(正则化)
适用正则逻辑回归,预测来自制造工厂的微芯片是否通过质量保证(QA)。在质量检查过程中,每个微芯片都要经过各种测试,以确保其正常运行。 假设您是工厂的产品经理,并且在两个不同的测试中获得了某些微芯片的测试结果。 从这两个测试中,您想确定应该接受还是拒绝微芯片。 为了帮助您做出决定,您拥有过去芯片上的测试结果数据集,您可以从中建立逻辑回归模型。
2.1 Visualizing the data(可视化)
调用代码:
% The first two columns contains the X values and the third column
% contains the label (y).
data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);
plotData(X, y);
% Put some labels
hold on;
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
% Specified in plot order
legend('y = 1', 'y = 0')
hold off;
结果:
图中显示,我们的数据集无法通过直线分为正样本和负样本。 因此,逻辑回归的直接应用在此数据集上效果不佳,因为逻辑回归只能找到线性决策边界。
2.2 Feature mapping
为了更好的拟合数据,将创建更多的特征项,函数将是多项式函数,直到6次幂。
这种映射的结果是,我们的两个特征的向量(两个QA测试的得分)已转换为28维向量。 在此较高维特征向量上训练的逻辑回归分类器将具有更复杂的决策边界,并且在我们的二维图中绘制时将显示非线性。
调用:
% Add Polynomial Features
% Note that mapFeature also adds a column of ones for us, so the intercept term is handled
X = mapFeature(X(:,1), X(:,2));
虽然特征映射使我们能够构建更具表达力的分类器,但它也更容易过拟合。 在练习的下一部分中,您将实现正则化logistic回归以拟合数据,并亲自了解正则化如何帮助解决过度拟合问题。
2.3 Cost function and gradient(代价函数与梯度下降)
正规化后的成本函数:
正规化后的梯度递减函数:
costFunctionReg.m:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
h = sigmoid(X*theta);
thetaFrom2=theta(2:end,1);
J = 1/m*(-y'*log(h)-(1-y')*log(1-h))+(lambda/(2*m))*sum(thetaFrom2.^2);
grad(1) = 1/m*sum(X(1)'*(h-y));
for item=2:size(theta)
grad(item)=1/m*sum((X(:,item)'*(h-y)))+(lambda/m)*theta(item);
end
% =============================================================
end
调用代码:
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1
lambda = 1;
% Compute and display initial cost and gradient for regularized logistic regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);
fprintf('Cost at initial theta (zeros): %f\n', cost);
Plotting the decision boundary(绘制决策边界):
通过调节lambda的值,得到不同的决策边界,代码如下:
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
lambda = 0;
% Set Options
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 1000);
% Optimize
[theta, J, exit_flag] = fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend('y = 1', 'y = 0', 'Decision boundary')
hold off;
结果如下: