CMU 10703: Deep Reinforcement Learning and Control, Spring 2017

Homepage

Warm up

Schedule

 

DateTopicsLecturerReadingsAdditional Material
Wed Jan 18Course IntroductionKaterina  
Mon Jan 23Intro to MDPs, POMDPsKaterinaSutton & Barto Ch 3 
Wed Jan 25Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy EvaluationKaterinaSutton & Barto Ch 4 
Mon Jan 30Monte Carlo Learning: value function estimation and optimizationRussSutton & Barto Ch 5 
Wed Feb 1Temporal Difference Learning: value function estimation and optimization, Q learning, SARSARussSutton & Barto Ch 6 
Mon Feb 6Planning and Learning(1): Tabular methods, Dyna, Monte Carlo Tree SearchKaterinaSutton & Barto Ch 8A Survey of Monte Carlo Tree Search Methods http://www.cameronius.com/cv/mcts-survey-master.pdf
Wed Feb 8Value function approximation, Deep Learning, Convnets, backpropagationRuss  
Mon Feb 13Value function approximation, Deep Learning, Convnets, backpropagationRuss  
Wed Feb 15Deep Q Learning : Double Q learning, replay memoryRuss  
Mon Feb 20Policy Gradients (1): REINFORCE, Natural Policy gradients,Variance reduction in gradient estimation, Actor-Critic, Deep Actor-Critic, TRPORussSutton & Barto Ch 13 
Wed Feb 22Policy Gradients (2)Russ  
Mon Feb 27Policy Gradients (3)Russ  
Wed Mar 1Closer look at Continuous Actions, Variational Autoencoders, multimodal stochastic policiesRuss  
Mon Mar 6Exploration(1)Katerina Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
https://arxiv.org/abs/1507.00814, Variational Information Maximizing Exploration https://arxiv.org/abs/1605.09674, visitation counts, hashing


Wed Mar 8Imitation learning(1): mimicking experts, behaviour cloningKaterina An Invitation to Imitation http://www.ri.cmu.edu/publication_view.html?pub_id=7891 Generative adversarial imitation learning
https://arxiv.org/abs/1606.03476
Mon Mar 13Spring break!   
Wed Mar 15Spring break!   
Mon Mar 20Imitation learning(2): Learning reward functions from demonstration, IOC, IRL  A Reduction of Imitation Learning and Structured Prediction
to No-Regret Online Learning http://www.jmlr.org/proceedings/papers/v15/ross11a/ross11a.pdf, Generative adversarial imitation learning https://arxiv.org/abs/1606.03476, Maximum entropy inverse reinforcement learning http://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf,Learning to search: Functional gradient techniques for imitation learning http://www.ri.cmu.edu/publication_view.html?pub_id=6410
Wed Mar 22Intro to optimal control, Differential Dynamic Programming, LQR, iterative-LQR Katerina  
Mon Mar 27Imitation learning(3): learning from optimal controllers, self trialsKaterina  End-to-End Training of Deep Visuomotor Policies https://arxiv.org/pdf/1504.00702.pdf, PLATO: Policy Learning using Adaptive Trajectory Optimization, https://arxiv.org/pdf/1603.00622v3.pdf
Wed Mar 29Planning and Learning(2): Learning Forward/Backward Models from experience, Planning with learned forward models, simulation to real world adaptationKaterina SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks
https://arxiv.org/pdf/1606.02378v2.pdf
Mon Apr 3Planning and Learning(3)   
4Case studies: Alpha Go, deep math Katerina  
Mon Apr 10Modular / Hierarchical RL (1): compositionality, temporal abstraction   
Wed Apr 12Modular / Hierarchical RL (2): Multi-task learning, curriculum learningRuss  
Mon Apr 17Exploration(2):Learning and exploration in 3D environments, Long Term Memory Russ  
Wed Apr 19Learning Motor Control: inspiration from Psychology Sutton & Barto Ch 14,15 
Mon Apr 24Frontiers/Open ProblemsKaterina  
Wed Apr 26Project Presentations   
Mon May 1Project Presentations    
Wed May 3Project Presentations   

Log

Week 1:

Jan 18 - Introduction

Week 2:

Jan 23 - Intro to MDPs, POMDPs

  • Slide
  • Sutton & Barto Ch 3
    • 3.1, 3.2, 3.3: 1/23/2017;

Jan 25 - Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation

  • Slide
  • Sutton & Barto Ch 4
    • 4.1: 1/25/2017;
  • implement Markov Decision Processes in Python
    • AIMA Python file: mdp.py (code taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig)

 

转载于:https://www.cnblogs.com/casperwin/p/6295396.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值