重磅课程 | MIPT推出深度强化学习进阶[AI前沿技术]

关注:决策智能与机器学习,深耕AI脱水干货

作者 | DeepRL

来源 | https://deeppavlov.ai

道 |  深度强化学习实验室

莫斯科物理技术研究所(MIPT,Moscow Institute of Physics and Technology)重磅推出强化学习进阶课程,本课程重点介绍深度强化学习近年来的最新研究进展,涉及强化学习中探索策略,模仿和反向强化学习,分层强化学习,强化学习中的进化策略,分布式强化学习,强化学习组合优化,多智能体强化学习,大规模强化学习,多任务和迁移强化学习,强化学习中的记忆机制,值得大家研究。

第一部分:课程

RL#1: 13.02.2020: Exploration in RL

Sergey Ivanov

  • Random Network Distillation [1]

  • Intrinsic Curiosity Module [2,3]

  • Episodic Curiosity through Reachability [4]

RL#2: 20.02.2020: Imitation and Inverse RL

Just Heuristic

  • Imitation Learning[5]

  • Inverse RL [6,7]

  • Learning from Human Preferences [8]

RL#3: 27.02.2020: Hierarchical Reinforcement Learning

Petr Kuderov

  • A framework for temporal abstraction in RL [9]

  • The Option-Critic Architecture [10]

  • FeUdal Networks for Hierarchical RL [11]

  • Data-Efficient Hierarchical RL [12]

  • Meta Learning Shared Hierarchies [13] 

RL#4: 5.03.2020: Evolutionary Strategies in RL

Evgenia Elistratova

  • A framework for temporal abstraction in reinforcement learning [14]

  • Improving Exploration in Evolution Strategies for Deep RL [15]

  • Paired Open-Ended Trailblazer (POET) [16]

  • Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

RL#5: 12.03.2020: Distributional Reinforcement Learning

Pavel Shvechikov

  • A Distributional Perspective on RL [18]

  • Distributional RL with Quantile Regression [19]

  • Implicit Quantile Networks for Distributional RL [20]

  • Fully Parameterized Quantile Function for Distributional RL [21]

RL#6: 19.03.2020:RL for Combinatorial optimization

Taras Khakhulin

  • RL for Solving the Vehicle Routing Problem [22]

  • Attention, Learn to Solve Routing Problems! [23]

  • Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]

  • Learning Combinatorial Optimization Algorithms over Graphs [25]

RL#7: 26.03.2020: RL as Probabilistic Inference

Pavel Termichev

  • RL and Control as Probabilistic Inference: Tutorial and Review [26]

  • RL with Deep Energy-Based Policies [27]

  • Soft Actor-Critic [28]

  • Variational Bayesian RL with Regret Bounds [29]

RL#8: 9.04.2020: Multi Agent Reinforcement Learning

Sergey Sviridov

  • Stabilising Experience Replay for Deep Multi-Agent RL [30]

  • Counterfactual Multi-Agent Policy Gradients [31]

  • Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]

  • Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]

  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

RL#9: 16.04.2020:  Model-Based Reinforcement Learning

Evgeny Kashin

  • DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]

  • Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]

  • World Models [37]

  • Model-Based RL for Atari [38]

  • Learning Latent Dynamics for Planning from Pixels [39] 

RL#10: 23.04.2020: Reinforcement Learning at Scale

Aleksandr Panin

  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]

  • HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]

  • GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]

  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]

  • Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

RL#11: 30.04.2020: Multitask & Transfer RL

Dmitry Nikulin

  • Universal Value Function Approximators [45]

  • Hindsight Experience Replay [46]

  • PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]

  • Progressive Neural Networks [48]

  • Learning an Embedding Space for Transferable Robot Skills [49]

RL#12: 07.05.2020: Memory in Reinforcement Learning

Artyom Sorokin

  • Recurrent Experience Replay in Distributed RL [50]

  • AMRL: Aggregated Memory For RL [51]

  • Unsupervised Predictive Memory in a Goal-Directed Agent [52]

  • Stabilizing Transformers for RL [53]

  • Model-Free Episodic Control [54]

  • Neural Episodic Control [55]

RL#13: 14.05.2020: Distributed RL In the wild

Sergey Kolesnikov

  • Asynchronous Methods for Deep RL [56]

  • IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]

  • Distributed Prioritized Experience Replay [58]

  • Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]

  • SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]

第二部分:项目

【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)

Implement the paper on the test environment of your choice.

【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)

Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.

【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)  

Implement the paper with pytorch (author's implementation uses TF). Check its results on the test environment of your choice (not from the paper).

【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【5】Episodic Reinforcement Learning with Associative Memory (Memory in  RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)

Implement the algorithm and test it on Atari games. Compare results with common baselines.

【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)

Implement the paper on tensorflow or chainer.

【8】Evolution Strategies as a Scalable Alternative to Reinforcement      Learning (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)

 Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【10】Comparative study of intrinsic motivations (Exploration in RL)

Using MountainCar-v0 compare:

1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0.

【11】Solving Unity Pyramids (Exploration in RL)

Try to reproduce this experiment using any intrinsic motivation you like.

【12】RND Exploratory Behavior (Exploration in RL)

There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.

【13】 Learning Improvement Heuristics for Solving the Travelling Salesman   Problem (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare   with avialable solvers.

【15】Variational RL with Regret Bounds (Variational RL)

Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.

Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks).

第三部分:课程资源

课程主页:https://deeppavlov.ai/rl_course_2020

Bilibili: https://www.bilibili.com/video/av668428103/

Youtube:

https://www.youtube.com/playlist?list=PLt1IfGj6-_-eXjZDFBfnAhAJmCyX227ir

交流合作

请加微信号:yan_kylin_phenix注明姓名+单位+从业方向+地点,非诚勿扰。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
量子力学的LaTeX摘要是通过使用LaTeX排版语言来呈现的。在莫斯科物理技术学院无线电工程与控制论系的教授所提供的年度课程中,使用LaTeX制作摘要有助于更好地展示量子力学的相关概念和理论。这种排版方式可以使学生更直观地理解量子力学的波函数表示和矩阵表示的等价性,并学会使用向量表示函数和用矩阵表示算符(包括一阶微分和二阶微分)。此外,LaTeX还可以用于数值求解任意势阱下的定态薛定谔方程,以获得能级和波函数。因此,使用LaTeX可以为学生提供一个更具视觉效果的学习工具,帮助他们更好地理解和应用量子力学的概念和方法。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [plot画图 python 双线_用python学量子力学(1)](https://blog.csdn.net/weixin_39730587/article/details/109918431)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* [quantum-mechanics-lectures:DREC MIPT学生的量子力学讲义](https://download.csdn.net/download/weixin_42116794/19112375)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值