机器学习系列之coursera week 10 Large Scale Machine Learning

目录

1. Gradient Descent with Large Datasets

1.1 Learning with large datasets

1.2 Stochastic gradient descent

1.3 Mini-Batch Gradient Dedscent

1.4 Stochastic gradient descent convergence

2. Advance Topics

2.1 Online learning

2.2 Map-Reduce and Data Paralelism


1. Gradient Descent with Large Datasets

1.1 Learning with large datasets

Learn with large datasets:

m = 100,000,000

plot learning cruve like this:

fig. 1

(引自coursera week 10 Learning with large datasets)

===> 可以用更多数据训练来降低泛化误差

1.2 Stochastic gradient descent

Linear regression with gradient descent:

===> 又叫batch gradient descent(考虑所有样本)

stochastic gradient descent:

(1) Randomly shuffle(reorder) training examples

(2) 

1.3 Mini-Batch Gradient Dedscent

Batch Gradient Descent: use all m examples in each iteration

Stochastic Gradient Descent: use 1 example in each iteration

Mini-Batch Gradient Descent: use b examples in each iteration

b = mini-batch size, always 10 or 2 to 100

say b = 10, m = 1000

1.4 Stochastic gradient descent convergence

checking for convergence:

During learning, compute Cost before updating θ using (x(i), y(i))

Every 1000 iterations, plot Cost averaged over the last 1000 examples processed by algorithm.

fig. 2

(引自coursera week 10 Stochastic gradient descent convergence)

Learning rate α is typically held constant. Can slowly decrease α over time if we want θ to converge(e.g.

α = const1 / (#iteration + const2)). 但往往这样将问题转变成寻找常数1和常数2,变得更加复杂.

2. Advance Topics

2.1 Online learning

Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users sometimes choose to use your shipping service(y = 1), not(y = 0).

Features x capture properties of user of origin/destination and asking price. We want to learn p(y = 1| x; θ) to optomize price.

Repeat forever {
Get (x, y) corresponding to user.
Update θ using (x, y)
θ_j = θ_j - α(h(x) - y)x_j
}

Online learing can adaot ti changing user tastes and it allows us to learn from a continuous stream of data, since we use each example once then no longer need to process it again.

2.2 Map-Reduce and Data Paralelism

fig. 3

(引自coursera week 10 Map-Reduce and Data Paralelism)

fig. 4

(引自coursera week 10 Map-Reduce and Data Paralelism)

fig. 5

(引自coursera week 10 Map-Reduce and Data Paralelism)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值