1. Learning with large datasets
- It's not who has the best algorithm that wins. It's who has the most data
2. Stochastic gradient descent
- batch gradient descent
repeat
- stochastic gradient descent
- randomly shuffle datasets
- repeat
3. Mini-batch gradient descent
- batch gradient descent: use all examples in each iteration
- stochastic gradient descent: use all 1 example in each iteration
- mini-batch gradient descent: use all examples in each iteration
4. Stochastic gradient descent convergence
- Learning rate is typically held constant. Can slowly decrease over time if we want to converge
5. Online learning
6. Map-reduce and data parallelism