吴恩达·Machine Learning || chap17 Large scale machine learning简记

本文探讨了在大型数据集上进行机器学习的重要性,重点介绍了三种梯度下降法:批量梯度下降、随机梯度下降和小批量梯度下降。详细阐述了每种方法的更新规则,并讨论了如何检查随机梯度下降的收敛性。此外,还提到了在线学习的应用,如优化运输服务网站的定价策略,并解释了MapReduce和数据并行性在处理大量数据时的角色。
摘要由CSDN通过智能技术生成

17 Large scale machine learning

17-1 Learning with large datasets

Machine learning and data

Classify between confusable words. E.g., (to, two, too), (then, than)

It’s not who has the best algorithm that wins It’s who has the most data

Learning with large datasets

在这里插入图片描述

17-2 Stochastic gradient descent

Learning regression with gradient descent


在这里插入图片描述

Stochastic gradient descent

  1. Randomly shuffle (reorder) training example

  2. Repeat{

    ​ for i : = 1 , ⋯   , m i:=1,\cdots,m i:=1,,m{

    θ j = θ j − α ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j=\theta_j-\alpha(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} θj=θjα(hθ(x(i))y(i))xj(i)

    ​ (for every j = 0 , ⋯   , n j=0,\cdots,n j=0,,n)

    ​ }

    }

17-3 Mini-batch gradient descent

Mini-batch gradient descent
Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration

example:

Say b = 10 , m = 1000 b=10,m=1000 b=10,m=1000
Repeat{
for i = 1 , 11 , 21 , 31 , ⋯   , 991 i=1,11,21,31,\cdots,991 i=1,11,21,31,,991{
θ j : = θ j − α 1 10 ∑ k = 1 i + 9 ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j:=\theta_j-\alpha\frac{1}{10}\sum_{k=1}^{i+9}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} θj:=θjα101k=1i+9(hθ(x(i))y(i))xj(i)
(for every j=0,…,n)
}
}

17-4 Stochastic gradient descent converence

Checking for convergence

Batch gradient descent:

​ Plot J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ) as a function of number of iterations of gradient decent.
J t r a i n ( θ ) = 1 2 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J_{train}(\theta)=\frac{1}{2m}(h_\theta(x^{(i)})-y^{(i)})^2 Jtrain(θ)=2m1(hθ(x(i))y(i))2

Stochastic gradient descent:

c o s t ( θ , ( x ( i ) , y ( i ) ) ) = 1 2 ( h θ ( x ( i ) ) − y ( i ) ) 2 cost( \theta , ( x ^ { ( i ) } , y ^ { ( i ) } ) ) = \frac { 1 } { 2 } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } cost(θ,(x(i),y(i)))=21(hθ(x(i))y(i))2

​ During learning, compute c o s t ( θ , ( x ( i ) , y ( i ) ) 2 cost(\theta,(x^{(i)},y^{(i)})^2 cost(θ,(x(i),y(i))2 before updating θ \theta θ using ( x ( i ) , y ( i ) ) (x^{(i)},y^{(i)}) (x(i),y(i))
​ Every 1000 iterations(say), plot c o s t ( θ , ( x ( i ) , y ( i ) ) 2 cost(\theta,(x^{(i)},y^{(i)})^2 cost(θ,(x(i),y(i))2 averaged over the last 1000 examples processed by algorithm

Learning rate a is typically held constant Can slowly decrease α \alpha α over time if we want θ \theta θ to converge. (E.g. α = c o n s t 1 i t e r a t i o n N u m b e r + c o n s t 2 \alpha=\frac{const1}{iterationNumber+const2} α=iterationNumber+const2const1)

17-5 Online learning

Online learning
Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users sometimes choose to use your shipping service (y=1) sometimes not (y=0).

Features x x x capture properties of user, of origin/destination and asking price. We want to learn p ( y = 1 ∣ x ; θ ) p(y=1|x;\theta) p(y=1x;θ) to optimize price.
在这里插入图片描述

Other online learning example:
Product search (learning to search)
User searches for “Android phone 1080p camera”
Have 100 phones in store Will return 10 results.

x x x=features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc.
y = 1 y=1 y=1 if user clicks on link. y = 0 y=0 y=0 otherwise
Learn p ( y = 1 ∣ x ; θ ) p(y=1|x;\theta) p(y=1x;θ)
Use to show user the 10 phones they’re most likely to click on.

Other examples: Choosing special offers to show user; customized selection of news articles; product recommendation

17-6 Map-reduce and data parallelism

减少映射和数据并行

Map-reduce

Batch gradient descent: θ j : = θ j − α 1 400 ∑ i = 1 400 ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta _ { j } : = \theta _ { j } - \alpha \frac { 1 } { 400 } \sum _ { i = 1 } ^ { 400 } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { j } ^ { ( i ) } θj:=θjα4001i=1400(hθ(x(i))y(i))xj(i)

在这里插入图片描述

Map-reduce and summation over the training set
Many learning algorithms can be expressed as computing sums of functions over the training set
在这里插入图片描述

Multi-core machines

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值