CS229 Lecture Note(1): Linear Regression

1. LMS Algorithm

  • The Ordinary Least Squares Regression Model:

    h(θ)=θTx

  • Cost Function:

    J(θ)=12i=1m(hθ(xi)yi)2

  • Gradient Descent Algorithm:

    θ:=θαθJ(θ)

  • LMS (least mean squares) update rule (also called for Widrow-Hoff learning rule):

    θj:=θj+αi=1m(yihθ(xi))xij

  • Batch Gradient Descent vs. Stochastic Gradient Descent

    
    # BGD
    
    Repeat until convergence {
        theta = theta + alpha * sum((y_i - h_i) * x_i)
    }
    
    
    # SGD
    
    Loop {
        for i=1 to m, {
            theta = theta + alpha * (y_i - h_i) * x_i
        }
    }
  • Normal Equation Solution:

    θ=(XTX)1XTY

2. Probabilistic Interpretation

  • Predictive Probability Assumption: a Gaussian Distribution

    p(y|x;θ)=12πσe(yθTx)22σ2(θTx,σ2)

  • Likelihood Function of θ : the probability of the given data y (given i.i.d. assumption)

    L(θ)=i=1mp(yi|xi;θ)=i=1m12πσe(yiθTxi)22σ2

  • Maximum Likelihood Method: choose θ to maximize L(θ) or the log likelihood l(θ) :

    l(θ)=logL(θ)12i=1m(yiθTxi)2
    θ=argmaxθl(θ)

The least-squares regression model corresponds to the maximum likelihood estimation of θ under a Gaussian distribution assumption on data.

3. Locally Weighted Linear Regression

  • Motivation: get rid of the problem of feature selection (which leads to the underfitting and overfitting problems)

  • Parametric vs. Non-parametric learning algorithm

  • LWR algorithm:
    Querying a certain point x ,

    1. Fit θ to minimize iwi(yiθTxi)2 , where wi=e(xix)22τ2

    2. Output θTx
    3. Hence, the (errors on) training examples close to the query point x would be given a much higher weight to determine θ (local linearity).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值