Machine Learning - Gradient Descent in Practice

This article contains some skills in the implementation of Gradient Descent, including feature scaling, mean normalization and choosing learning rate.


Gradient descent in practice


1. Feature Scaling


Feature Scaling

Idea: Make sure features are on a similar scale, so the gradient descent will converge much faster.

  E.g.     x1 = size (0-2000 feet2)

              x2 = number of bedrooms (1-5) 



Get every feature into approximately a -1≤ x≤1 range.

These ranges are ok:


These ranges are not good

Mean normalization 

Replace xi  with xi -μi to make features have approximately zero mean (Do not apply tox0 =1). Where μiis the mean value of feature i.

E.g.




2. Learning rate


Gradient descent 

  • “Debugging”: How to make sure gradient descent is working correctly.
  • How to choose learning rate α

Making sure gradient descent is working correctly.

  • J(θ) should decrease every iteration.


Example automatic convergence test:

Declare convergence if J(θ) decreases by less than 10-3 in one iteration.

  • Gradient descent not working: Use smaller α.


  • For sufficiently small α , J(θ) should decrease on every iteration.
  • But if α is too small, gradient descent can be slow to converge.

Summary: 

  • If α is too small: slow convergence.
  • If α is too large: J(θ) may not decrease on every iteration; may not converge.

To choose α, try:  ...,0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...


3. Matlab Code of Gradient descent with Multiple Variables

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
//% X is the training data (design matrix)
//% y is the label of training data (in vector)
//% theta is the vector of parameters
//% alpha is the learning rate
//% num_iters is the number of iterations

//% Initialize some values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

    for iter = 1:num_iters
 
        //% Perform a single gradient step on the parameter vector  theta. 
 
        for i = 1:size(X,2)
            theta_tmp(i) = theta(i) - alpha*(sum((X*theta-y).*X(:,i)))/m;
        end
        theta = theta_tmp';
 
         //% Save the cost J in every iteration    
        J_history(iter) = computeCostMulti(X, y, theta);  //% compute J(theta)
 
    end 
end


function J = computeCostMulti(X, y, theta)
//% COMPUTECOSTMULTI Compute cost for linear regression with multiple variables

//% Initialize some useful values
m = length(y); % number of training examples
J = 0;

//% Compute the cost of a particular choice of theta
J = 0.5*(X*theta-y)'*(X*theta-y)/m;
  
end


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值