吴恩达机器学习笔记(五)--多变量线性回归

吴恩达机器学习笔记(五)–多变量线性回归

学习基于:吴恩达机器学习.

1. Multiple Features

Linear regression with multiple variables is also known as “multivariate linear regression”.

equationnotation
x j ( i ) x_j^{(i)} xj(i)value of feature j in the ith training example
x ( i ) x^{(i)} x(i)the input (features) of the ith training example
m m mthe number of training examples
n n nthe number of features
  • The multivariable form of the hypothesis function accommodating these multiple features is as follows:
    h θ ( x ) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n hθ(x)=θ0x0+θ1x1+θ2x2+...+θnxn      ( x 0 ≡ 1 ) ( x_0 \equiv 1 ) (x01)

  • Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:
    h θ ( x ) = [ θ 0 θ 1 . . . θ n ] [ x 0 x 1 . . . x n ] = θ T x h_\theta(x) = \left[ \begin{matrix} \theta_0 & \theta_1 & ... & \theta_n \end{matrix} \right]\left[ \begin{matrix} x_0 \\ x_1 \\ ... \\ x_n \end{matrix} \right] = \theta^Tx hθ(x)=[θ0θ1...θn]x0x1...xn=θTx

2. Gradient Descent For Multiple Variables

The gradient descent equation itself is generally the same form; we just have to repeat it for our ‘n’ features:

  • repeat until convergence: {
      θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x j ( i ) \theta_{j} := \theta_{j} - \alpha\frac{1}{m}\sum_{i = 1}^{m}(h_{\theta}(x^{i})-y^{i})x_j^{(i)} θj:=θjαm1i=1m(hθ(xi)yi)xj(i)
     for j : = 0... n j := 0 ... n j:=0...n
    }
1) Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

  • The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:
    • − 1 ≤ x i ≤ 1 -1 \leq x_i \leq 1 1xi1
2) Learning Rate
  • This is the gradient descent algorithm:

    • θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) . \theta_{j} := \theta_{j} - \alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0}, \theta_{1}). θj:=θjαθjJ(θ0,θ1).
  • We need to adjust the value of α \alpha α so that gradient descent can converge

在这里插入图片描述


在这里插入图片描述

  • If α is too small: slow convergence.
  • If α is too large: may not decrease on every iteration and thus may not converge.

3. Polynomial Regression

Our hypothesis function need not be linear (a straight line) if that does not fit the data well.

  • For example:
    • h θ ( x ) = θ 0 + θ 1 x + θ 2 x 2 h_{\theta}(x) = \theta_0 + \theta_1x +\theta_2x^2 hθ(x)=θ0+θ1x+θ2x2

4. Normal Equation

We can use normal equation to get the optimal value of θ \theta θ.

  • θ = ( X T X ) − 1 X T Y \theta = (X^TX)^{-1}X^TY θ=(XTX)1XTY

In Octave or MATLAB:

pinv(X'*X)*X'*Y

function pinv() is to calculate the pseudo inverse matrix, so no matter the matrix is invertible or not, we can still get the correct result.

So what’s the difference between gradient descent and normal equation?

DifferenceGradient DescentNormal Equation
Need to choose α \alpha αYesNo
Need many iterationsYesNo
When n n n is largeWorks wellWorks slowly
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值