吴恩达·Machine Learning || chap17 Large scale machine learning简记

最新推荐文章于 2024-08-14 19:43:55 发布

The Prestige

最新推荐文章于 2024-08-14 19:43:55 发布

阅读量112

点赞数

分类专栏： Machine Learning 文章标签：机器学习

本文链接：https://blog.csdn.net/qq_46203130/article/details/120248788

版权

Machine Learning 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

本文探讨了在大型数据集上进行机器学习的重要性，重点介绍了三种梯度下降法：批量梯度下降、随机梯度下降和小批量梯度下降。详细阐述了每种方法的更新规则，并讨论了如何检查随机梯度下降的收敛性。此外，还提到了在线学习的应用，如优化运输服务网站的定价策略，并解释了MapReduce和数据并行性在处理大量数据时的角色。

摘要由CSDN通过智能技术生成

17 Large scale machine learning

17-1 Learning with large datasets

Machine learning and data

Classify between confusable words. E.g., (to, two, too), (then, than)

It’s not who has the best algorithm that wins It’s who has the most data

Learning with large datasets

在这里插入图片描述

17-2 Stochastic gradient descent

Learning regression with gradient descent

在这里插入图片描述

Stochastic gradient descent

Randomly shuffle (reorder) training example
Repeat{

for $i:=1,\cdots,m$ {

$\theta_j=\theta_j-\alpha(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$

(for every $j=0,\cdots,n$ )

}

}

17-3 Mini-batch gradient descent

Mini-batch gradient descent
Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration

example：

Say $b = 10, m = 1000$
Repeat{
for $i=1,11,21,31,\cdots,991$ {
$\theta_j:=\theta_j-\alpha\frac{1}{10}\sum_{k=1}^{i+9}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
（for every j=0，…,n）
}
}

17-4 Stochastic gradient descent converence

Checking for convergence

Batch gradient descent:

Plot $J_{train}(\theta)$ as a function of number of iterations of gradient decent.
$J_{train}(\theta)=\frac{1}{2m}(h_\theta(x^{(i)})-y^{(i)})^2$

Stochastic gradient descent:

$\theta , ( x ^ { ( i ) } , y ^ { ( i ) } ) ) = \frac { 1 } { 2 } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 }$

During learning, compute $cost(\theta,(x^{(i)},y^{(i)})^2$ before updating $\theta$ using $x^{(i)},y^{(i)})$
Every 1000 iterations(say), plot $cost(\theta,(x^{(i)},y^{(i)})^2$ averaged over the last 1000 examples processed by algorithm

Learning rate a is typically held constant Can slowly decrease $\alpha$ over time if we want $\theta$ to converge. (E.g. $\alpha=\frac{const1}{iterationNumber+const2}$ )

17-5 Online learning

Online learning
Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users sometimes choose to use your shipping service (y=1) sometimes not (y=0).

Features $x$ capture properties of user, of origin/destination and asking price. We want to learn $p(y=1|x;\theta)$ to optimize price.
在这里插入图片描述

Other online learning example:
Product search (learning to search)
User searches for “Android phone 1080p camera”
Have 100 phones in store Will return 10 results.

$x$ =features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc.
$y = 1$ if user clicks on link. $y = 0$ otherwise
Learn $p(y=1|x;\theta)$
Use to show user the 10 phones they’re most likely to click on.

Other examples: Choosing special offers to show user; customized selection of news articles; product recommendation

17-6 Map-reduce and data parallelism

减少映射和数据并行

Map-reduce

Batch gradient descent： $\theta _ { j } : = \theta _ { j } - \alpha \frac { 1 } { 400 } \sum _ { i = 1 } ^ { 400 } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { j } ^ { ( i ) }$

在这里插入图片描述

Map-reduce and summation over the training set
Many learning algorithms can be expressed as computing sums of functions over the training set
在这里插入图片描述

Multi-core machines

The Prestige

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达·Machine Learning || chap17 Large scale machine learning简记

17 Large scale machine learning17-1 Learning with large datasetsMachine learning and dataClassify between confusable words. E.g., (to, two, too), (then, than)It’s not who has the best algorithm that wins It’s who has the most dataLearning with large
复制链接

扫一扫

专栏目录