ex2 machine learning -2-Regularized logistic regression
1.导入数据
和ex2-1类似的结构,共三列,前两列是自变量,最后是因变量
2.常规操作
clear ; close all; clc
data = load(‘ex2data2.txt’);
X = data(:, [1, 2]); y = data(:, 3);
plotData(X, y);
hold on;
xlabel(‘Microchip Test 1’)
ylabel(‘Microchip Test 2’)
legend(‘y = 1’, ‘y = 0’)
hold off;
这部分代码解释参见ex2 machine learning 整作业解析-1-Logistic Regression
下一句是
X = mapFeature(X(:,1), X(:,2));
在pdf中看到对mapFeature的介绍
3.mapFeature()
查看mapFeature函数内部
function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
% MAPFEATURE(X1, X2) maps the two input features
% to quadratic features used in the regularization exercise.
%
% Returns a new feature array with more features, comprising of
% X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
% Inputs X1, X2 must be the same size
%
degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end+1) = (X1.^(i-j)).*(X2.^j);
end
end
end
接受参数X1X2:X = mapFeature(X(:,1), X(:,2));
可以看到,是将X的第一列和第二列传进去,也就是两个同等维数的列向量
目标是生成所有6次方x1,x2的组合,可知这样的组合有28个,即1, x1, x2, x1^2 ,x1x2, x2^2 ,…, x1^6 , x2 ^6
可以看到X是一个118行28列的矩阵
怎么得到的呢?
查看y,可以看到y是118行的列向量,所以X是有118行的
那么也就是说,X的每一行,本身的两个数的(2列),现在要变成28列,也就是说,每一行的两个数,即x1,x2,要拓展成一个一行28列。
那么我们可以先建立一个118行28列的1矩阵,然后它的每一行,就等于原矩阵的一行变幻出28行后的行向量。
另一方面,也可以看成,输出的矩阵第一列是1,第二列是原先的x1,第三是x2,第四是x1.*x1(x1的各个元素逐一相乘), x1.*x2,x2.*x2,etc
out = ones(size(X1(:,1)));
size(X1(:,1))得到的是一个向量,即,(x1的行数,x1的列数)
out初始化为一个全为1的列向量,行数与x1相同
这里的 i x1x2的总次方,因为这个feature要从0次方一直到6次方;j是x2的指数。
out(:, end+1) = (X1.^(i-j)).*(X2.^j);
end+1,就是在最后增加一列。
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1
lambda = 1;
theta的初值为维数为X的列数的列向量
4.costFunctionReg()
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);
cost是一个数,即,这个theta值的cost
grad是一个列向量,维数是theta的维数
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
h = sigmoid(X * theta);
J = -1 * y' *log(h) - (1 - y)' * log(1 - h) + sum(theta(2:end).^2) / 2 * lambda;
J = J / m;
grad = (1 / m * (h - y)' * X)' + lambda / m * theta;
grad(1)=grad(1)-lambda/m*theta(1);
% =============================================================
end
参见解析ex2 machine learning 整作业解析-1-Logistic Regression中对costFunction的解释
J
值得注意的是,加入的theta各个元素的平方,不包含theta0
在ex2-1中的J = -1 * y’ *log(h) - (1 - y)’ * log(1 - h); J = J / m;
在这里,J要增多一项
(theta(2:end):取出theta第二行到最后 一行所有的元素
theta(2:end).^2:单个元素逐一平方
接着求和,etc…
grad
与没有regularize的相比
多了一项λ/mθj, 但是不包含θ0
也就是gradient1是不包含regularization的这一项的
那么可以先统一地加上这一项(方便计算)
grad = (1 / m * (h - y)’ * X)’ + lambda / m * theta;
最后再针对grad(1)减去那一项
grad(1)=grad(1)-lambda/mtheta(1);
5.测试不同的lambda值
fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');
fprintf('\nProgram paused. Press enter to continue.\n');
pause;
% Compute and display cost and gradient
% with all-ones theta and lambda = 10
test_theta = ones(size(X,2),1);
[cost, grad] = costFunctionReg(test_theta, X, y, 10);
fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
fprintf('Expected cost (approx): 3.16\n');
fprintf('Gradient at test theta - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');
fprintf('\nProgram paused. Press enter to continue.\n');
pause;
可以看到第一次迭代的时候,cost更大,gradients也更大,因为lambda大了嘛
6.优化
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;
% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Optimize
[theta, J, exit_flag] = ...
fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
7.plotDecisionBoundary(theta, X, y);
function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
% PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the
% positive examples and o for the negative examples. X is assumed to be
% a either
% 1) Mx3 matrix, where the first column is an all-ones column for the
% intercept.
% 2) MxN, N>3 matrix, where the first column is all-ones
% Plot Data
plotData(X(:,2:3), y);
hold on
if size(X, 2) <= 3
% Only need 2 points to define a line, so choose two endpoints
plot_x = [min(X(:,2))-2, max(X(:,2))+2];
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));
% Plot, and adjust axes for better viewing
plot(plot_x, plot_y)
% Legend, specific for the exercise
legend('Admitted', 'Not admitted', 'Decision Boundary')
axis([30, 100, 30, 100])
else
% Here is the grid range
u = linspace(-1, 1.5, 50);
v = linspace(-1, 1.5, 50);
z = zeros(length(u), length(v));
% Evaluate z = theta*x over the grid
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = mapFeature(u(i), v(j))*theta;
end
end
z = z'; % important to transpose z before calling contour
% Plot z = 0
% Notice you need to specify the range [0, 0]
contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold off
end
从size(X)可以看到,这里X的列数要大于3,所以执行else
也就是说,ex2-1用的是两点确定一条直线的方法,这里得到的函数不再是直线,所以要取很多个点,一个一个画出来,看起来就是一条平滑的曲线
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = mapFeature(u(i), v(j))*theta;
end
end
u(i)对应x轴坐标,v(j)对应y轴坐标。mapFreature得到一个1 * 28的矩阵(因为传入的x1,x2只是标量而已),theta是28*1的列向量。所以得到一个数。
可以看到z是一个50 * 50的矩阵
z = z’
最后绘制等高线:contour(u, v, z, [0, 0], ‘LineWidth’, 2)
参见matlab中contour 函数的用法(绘制等高线)