【Machine Learning】梯度下降 Gradient Descent

最新推荐文章于 2024-07-17 00:50:53 发布

lyozhou

最新推荐文章于 2024-07-17 00:50:53 发布

阅读量1.2k

点赞数

文章标签： Machine Learning 笔记

本文链接：https://blog.csdn.net/lyozhou/article/details/9386525

版权

【提要】

如前一笔记“ 单参数线性回归”所留下来的J(theta)关于theta0和theta1的三维模型，我们需要得到J(theta)值最小时theta0和theta1的取值，于是我们使用梯度下降法，任选取一点(theta0,theta1)，求改点往“山谷”移动的路径图。具体梯度下降法的过程如下：

【原理】

对J(theta0,theta1)的求导，其实就是求切线的斜率过程。假设J(theta1)为抛物线型，我们身处于红点，为了找到“山谷”，我们使用梯度下降公式，得到本点的切线斜率k（正数），theta1则比原来减去a*k；若我们身处于蓝点，得到本点切线斜率k（负数），所以theta1则比原来加上a*k。

其中a为学习速率，若设置的太小，变化速度会缓慢；若设置的太大，则更新太快，导致错过最佳值。

若a是个固定值，也不影响theta会推进过头，因为对J(theta)的求导值k会越来越小，则a*k也会越变越小，引用wiki上对梯度下降法的特点为：

1、靠近极小值时速度减慢。

2、直线搜索可能会产生一些问题。

3、可能会'之字型'地下降。

【Matlab】

1、由于好久没有复习高数，导致公式都不记得了，于是用了最愚蠢的方式推导了一遍。

2、有了上述的推导，就比较好写code了：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha
 
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1); 
for iter = 1:num_iters
 
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    theta = theta - (alpha/m) * (X' * (X * theta - y));     %X转置因为theta1时需要多乘一个X
    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta); 
end
end

3、使用课程给的调用函数进行测试，如何验证算法的正确性？最简单的方法就是看每步给的J(theta)都是递减的

data = load('ex1data1.txt');
m = length(y); % number of training examples
y = data(:,2);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);
iterations = 1500;
alpha = 0.01;
theta = gradientDescent(X, y, theta, alpha, iterations);
% print theta to screen
fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
% Plot the linear fit
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure
 
% Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
    predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n',...
    predict2*10000);
 
fprintf('Program paused. Press enter to continue.\n');
pause;

4、最后得到的拟合结果为：