Coursera机器学习作业分析三(ex 1-3)

2.2.4 梯度下降

有了前面的正确基础,我们可以开始最核心的梯度下降了

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
     
    H_theta=X*theta;        
    temp1=theta(1)-alpha*(1/m)*sum((H_theta-y).*X(:,1));
    temp2=theta(2)-alpha*(1/m)*sum((H_theta-y).*X(:,2));
    theta(1)=temp1;
    theta(2)=temp2;
         
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

这里我用的theta更新是分开的,可不可以同时用矩阵方法更新呢?应该是可以的,如果用矩阵方法呢?

     theta=theta-alpha.*(1/m).*X'*(H_theta-y);

一句话搞定,这里只要把维度搞清楚,利用矩阵乘法中的天然的加法就ok了

fprintf('\nRunning Gradient Descent ...\n')
% run gradient descent
theta = gradientDescent(X, y, theta, alpha, iterations);

回归到ex1.m中,这里theta得到了一个最终的值

% print theta to screen
fprintf('Theta found by gradient descent:\n');
fprintf('%f\n', theta);
fprintf('Expected theta values (approx)\n');
fprintf(' -3.6303\n  1.1664\n\n');

% Plot the linear fit
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure

接着就会进行相应的输出和画图

Running Gradient Descent ...
Theta found by gradient descent:
-3.630291
1.166362
Expected theta values (approx)
 -3.6303
  1.1664

可以看到预测的蓝色线了,是一条线性的曲线

有了直线我们当然要预测一下数据了

% Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
    predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n',...
    predict2*10000);

fprintf('Program paused. Press enter to continue.\n');
pause;

这里作业还需要收尾工作,我们要看一下整个梯度下降过程中的代价函数到底是怎么下降的!

2.4 代价函数可视化

首先看代码

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============
fprintf('Visualizing J(theta_0, theta_1) ...\n')

% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100); %规定了我们画图的theta0的取值范围
theta1_vals = linspace(-1, 4, 100);   %规定了我们画图的theta1的取值范围

% initialize J_vals to a matrix of 0's %初始化画图的J为0矩阵
J_vals = zeros(length(theta0_vals), length(theta1_vals));

% Fill out J_vals  计算代价函数值
for i = 1:length(theta0_vals)
    for j = 1:length(theta1_vals)
	  t = [theta0_vals(i); theta1_vals(j)];
	  J_vals(i,j) = computeCost(X, y, t);
    end
end


% Because of the way meshgrids work in the surf command, we need to         为什么要在surf前翻转J呢?
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals) 
xlabel('\theta_0'); ylabel('\theta_1');


这里吗有个问题,为什么要在使用surf前翻转J呢?

>> help surf
'surf' is a function from the file C:\Octave\OCTAVE~1.2\share\octave\4.2.2\m\plot\draw\surf.m

 -- surf (X, Y, Z)
 -- surf (Z)
 -- surf (..., C)
 -- surf (..., PROP, VAL, ...)
 -- surf (HAX, ...)
 -- H = surf (...)
     Plot a 3-D surface mesh.

     The surface mesh is plotted using shaded rectangles.  The vertices
     of the rectangles [X, Y] are typically the output of 'meshgrid'.
     over a 2-D rectangular region in the x-y plane.  Z determines the
     height above the plane of each vertex.  If only a single Z matrix
     is given, then it is plotted over the meshgrid 'X = 1:columns (Z),
     Y = 1:rows (Z)'.  Thus, columns of Z correspond to different X
     values and rows of Z correspond to different Y values.

     The color of the surface is computed by linearly scaling the Z
     values to fit the range of the current colormap.  Use 'caxis'
     and/or change the colormap to control the appearance.

     Optionally, the color of the surface can be specified independently
     of Z by supplying a color matrix, C.

     Any property/value pairs are passed directly to the underlying
     surface object.

     If the first argument HAX is an axes handle, then plot into this
     axes, rather than the current axes returned by 'gca'.

     The optional return value H is a graphics handle to the created
     surface object.

     Note: The exact appearance of the surface can be controlled with
     the 'shading' command or by using 'set' to control surface object
     properties.

     See also: ezsurf, surfc, surfl, surfnorm, trisurf, contour, mesh,
     surface, meshgrid, hidden, shading, colormap, caxis.
这里的意思是surf(X,Y,Z)三个参数,X是列,Y是行,由于我们设置的theta0是行,所以要翻转一下,这样才能对应。


除了画出J的变化图,我们还需要画出等高线图,并且找到我们最后输出的theta

% Contour plot 等高线图
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
这里的logspace定义了10^-2到10^3,中间一共20个数


我看到等高线最中间那个红色x代表最低点,这个可以看出跟我们预测的theta值是一致的!

到这里常规作业就完成了,后面还有option的作业,我也会进行分析。

阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页