Andrew Ng Machine Learning notes - Course1 Week1

Course 1: Supervised Machine Learning: Regression and Classification

Week 1: Introduction to Machine Learning

supervised learning v.s. unsupervised learning

supervised learning:

algorithms that learn x to y. give your learning algorithm examples to learn from, given “right answers” (output label).
e.g.

input(X)output(Y)application
emailspam? (0/1)spam filtering
audiotext transcriptspeech recognition
EnglishSpanishmachine translation
ad, user infoclick? (0/1)online advertising
image, radar infoposition of other carsself-driving car
image of phonedefect? (0/1)visual inspection

Regression: predict a number from infinitely many possible outputs
Classification: predict categories from a small number of possible outputs

unsupervised learning:

given data that isn’t associated with any output label y, find some structure/pattern / something interesting in unlabeled data

Clustering: group similar data points together. e.g. Google news, DNA microarray, grouping customers
Anomaly Detection: find unusual data points. e.g. fraud detection
Dimensionality Reduction: compress data using fewer numbers

Regression model

Linear Regression with one variable

Notation:
x x x = “input” variable, feature
y y y = “output” variable, “target” variable
m m m = number of training examples
( x , y ) (x, y) (x,y) = single training example
( x ( i ) , y ( i ) ) (x^{(i)}, y^{(i)}) (x(i),y(i)) = i-th training example

Univariate linear regression: linear regression with one variable f w , b ( x ) = w x + b f_{w,b}(x) = wx+b fw,b(x)=wx+b

Cost Function:
squared-error cost function
J ( w , b ) = 1 2 m ∑ i = 1 m ( y ^ ( i ) − y ( i ) ) 2 J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)}-y^{(i)})^2 J(w,b)=2m1i=1m(y^(i)y(i))2
where y ^ ( i ) = f w , b ( x ( i ) ) \hat{y}^{(i)} =f_{w,b}(x^{(i)}) y^(i)=fw,b(x(i))

bowl-shaped for squared-error cost function

Train the model with gradient descent

Gradient Descent:
repeat until convergence:
w = w − α ∂ ∂ w J ( w , b ) w = w - \alpha \frac{\partial}{\partial w} J(w,b) w=wαwJ(w,b) b = b − α ∂ ∂ b J ( w , b ) b = b - \alpha \frac{\partial}{\partial b} J(w,b) b=bαbJ(w,b)
where α \alpha α is the learning rate
Note: simultaneously update w w w and b b b. simultaneously means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

在这里插入图片描述
Choosing a different starting point (even just a few steps away from the original starting point), may leading to the reached local minimum different.

在这里插入图片描述

Learning Rate:
if α \alpha α is too small, gradient descent will work but may be slow.
if α \alpha α is too large, gradient descent may overshoot and never reach minimum. May fail to converge, and even diverge

If already at a local minimum, gradient descent leaves w w w unchanged (since slope=0).

Gradient descent can reach local minimum with fixed learning rate. Because: as we get nearer a local minimum, gradient descent will automatically take smaller steps, since derivative automatically gets smaller.

Gradient Descent for Linear Regression:
w = w − α ∂ ∂ w J ( w , b ) w = w - \alpha \frac{\partial}{\partial w} J(w,b) w=wαwJ(w,b) b = b − α ∂ ∂ b J ( w , b ) b = b - \alpha \frac{\partial}{\partial b} J(w,b) b=bαbJ(w,b)
where
∂ ∂ w J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x ( i ) \frac{\partial}{\partial w} J(w,b) = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} wJ(w,b)=m1i=1m(fw,b(x(i))y(i))x(i) ∂ ∂ b J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) \frac{\partial}{\partial b} J(w,b) = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)}) bJ(w,b)=m1i=1m(fw,b(x(i))y(i))
在这里插入图片描述
Squared-error cost function is a convex function, which has a single global minimum, because of the bowl shape. So as long as your learning rate is chosen appropriately, it will always converge to the global minimum.

“Batch” gradient descent: each step of gradient descent uses all the training examples.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值