Stanford ML - Lecture 3 - Logistic regression

1. Classification

2. Hypothesis Representation

  • Logistic Regression Model

       
       
the above function is called sigmoid function or logistic function
  • Interpretation of Hypothesis output
estimated probability that on input

3. Decision Boundary

       
       
linear decision boundaries or non-linear decision boundaries

4. Cost function


5. Simplified cost function and gradient descent


logistic regression cost function

To fit parameters 

To make a new prediction given new 

Gradient descent

algorithm looks identical to linear regression, why??




Q:Suppose you are running gradient descent to fit a logistic regression model with parameter  θRn+1 . Which of the following is a reasonable way to make sure the learning rate  α  is set properly and that gradient descent is running correctly?
A:Plot as a function of the number of iterations and make sure  is decreasing  on every iteration.

6. Advanced optimization

optimization algorithms:
  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS
the last three algorithms have the following advantages:
  • no need to manually pick 
  • often faster than gradient descent
disadvantages:
  • more complex

7. Multi-class classification: one-vs-all

  • train a logistic regression classifier for each class to predict the probability
  • on a new input , to make a prediction, pick the class  that maximizes

8. The problem of overfitting

  • if we have too many features, the learned hypothesis may fit the training set very well, but fail to generate to new examples.
  • the solutions for overfitting
    1. reduce number of features
      • manually select which features to keep
      • model selection algorithm
    2. regularization
      • keep all the features, but reduce magnitude/value of parameters
      • works well when we have a lot of features, each of which contributes a bit to predicting 

9. Cost function

  • Regularization
    • small values for parameters 
      • "simpler" hypothesis
      • less prone to overfitting

10. Regularized linear regression


  • in the above function,  is regularization term
  •  is regularization parameter, it does the following trade off
    • it helps the first term to fit training set
    • it helps to reduce the value of the parameters
  • Gradient descent


  • Normal equation

the matrix in the above equation is a(n+1)-by-(n+1) matrix

 may be non-invertible/singular, but  may be invertible

11. Regularized logistic regression

  • cost function


  • gradient descent



From: http://blog.csdn.net/abcjennifer/article/details/7716281

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值