Warm up
- First Chapters from Reinforcement Learning: an Introduction, Sutton&Barto ,Second Edition (pdf) & also ebook here
- Dave Silver’s course and lecture videos on reinforcement learning
Schedule
Date | Topics | Lecturer | Readings | Additional Material |
Wed Jan 18 | Course Introduction | Katerina | ||
Mon Jan 23 | Intro to MDPs, POMDPs | Katerina | Sutton & Barto Ch 3 | |
Wed Jan 25 | Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation | Katerina | Sutton & Barto Ch 4 | |
Mon Jan 30 | Monte Carlo Learning: value function estimation and optimization | Russ | Sutton & Barto Ch 5 | |
Wed Feb 1 | Temporal Difference Learning: value function estimation and optimization, Q learning, SARSA | Russ | Sutton & Barto Ch 6 | |
Mon Feb 6 | Planning and Learning(1): Tabular methods, Dyna, Monte Carlo Tree Search | Katerina | Sutton & Barto Ch 8 | A Survey of Monte Carlo Tree Search Methods http://www.cameronius.com/cv/mcts-survey-master.pdf |
Wed Feb 8 | Value function approximation, Deep Learning, Convnets, backpropagation | Russ | ||
Mon Feb 13 | Value function approximation, Deep Learning, Convnets, backpropagation | Russ | ||
Wed Feb 15 | Deep Q Learning : Double Q learning, replay memory | Russ | ||
Mon Feb 20 | Policy Gradients (1): REINFORCE, Natural Policy gradients,Variance reduction in gradient estimation, Actor-Critic, Deep Actor-Critic, TRPO | Russ | Sutton & Barto Ch 13 | |
Wed Feb 22 | Policy Gradients (2) | Russ | ||
Mon Feb 27 | Policy Gradients (3) | Russ | ||
Wed Mar 1 | Closer look at Continuous Actions, Variational Autoencoders, multimodal stochastic policies | Russ | ||
Mon Mar 6 | Exploration(1) | Katerina | Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models https://arxiv.org/abs/1507.00814, Variational Information Maximizing Exploration https://arxiv.org/abs/1605.09674, visitation counts, hashing | |
Wed Mar 8 | Imitation learning(1): mimicking experts, behaviour cloning | Katerina | An Invitation to Imitation http://www.ri.cmu.edu/publication_view.html?pub_id=7891 Generative adversarial imitation learning https://arxiv.org/abs/1606.03476 | |
Mon Mar 13 | Spring break! | |||
Wed Mar 15 | Spring break! | |||
Mon Mar 20 | Imitation learning(2): Learning reward functions from demonstration, IOC, IRL | A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning http://www.jmlr.org/proceedings/papers/v15/ross11a/ross11a.pdf, Generative adversarial imitation learning https://arxiv.org/abs/1606.03476, Maximum entropy inverse reinforcement learning http://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf,Learning to search: Functional gradient techniques for imitation learning http://www.ri.cmu.edu/publication_view.html?pub_id=6410 | ||
Wed Mar 22 | Intro to optimal control, Differential Dynamic Programming, LQR, iterative-LQR | Katerina | ||
Mon Mar 27 | Imitation learning(3): learning from optimal controllers, self trials | Katerina | End-to-End Training of Deep Visuomotor Policies https://arxiv.org/pdf/1504.00702.pdf, PLATO: Policy Learning using Adaptive Trajectory Optimization, https://arxiv.org/pdf/1603.00622v3.pdf | |
Wed Mar 29 | Planning and Learning(2): Learning Forward/Backward Models from experience, Planning with learned forward models, simulation to real world adaptation | Katerina | SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks https://arxiv.org/pdf/1606.02378v2.pdf | |
Mon Apr 3 | Planning and Learning(3) | |||
4 | Case studies: Alpha Go, deep math | Katerina | ||
Mon Apr 10 | Modular / Hierarchical RL (1): compositionality, temporal abstraction | |||
Wed Apr 12 | Modular / Hierarchical RL (2): Multi-task learning, curriculum learning | Russ | ||
Mon Apr 17 | Exploration(2):Learning and exploration in 3D environments, Long Term Memory | Russ | ||
Wed Apr 19 | Learning Motor Control: inspiration from Psychology | Sutton & Barto Ch 14,15 | ||
Mon Apr 24 | Frontiers/Open Problems | Katerina | ||
Wed Apr 26 | Project Presentations | |||
Mon May 1 | Project Presentations | |||
Wed May 3 | Project Presentations |
Log
Week 1:
Jan 18 - Introduction
- Slide 1
- 1/23/2017;
- First Chapters from Reinforcement Learning: an Introduction, Sutton&Barto ,Second Edition
- Chapter 1: 1/19/2017;
- Lecture 1 & 2 from Dave Silver’s course and lecture videos on reinforcement learning
- Lecture 1: 1/17/2017;
Week 2:
Jan 23 - Intro to MDPs, POMDPs
- Slide
- Sutton & Barto Ch 3
- 3.1, 3.2, 3.3: 1/23/2017;
Jan 25 - Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation
- Slide
- Sutton & Barto Ch 4
- 4.1: 1/25/2017;
- implement Markov Decision Processes in Python
-
AIMA Python file: mdp.py (code taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig)
-