【deep RL】【OpenAI】【一】

OpenAI地址:

https://spinningup.openai.com/en/latest/index.html

GitHub地址:

https://github.com/openai/spinningup/blob/master/docs/user/running.rst

 

Welcome to Spinning Up in Deep RL!

User Documentation

Introduction

What This Is

Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).

For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning.

This module contains a variety of helpful resources, including:

  • a short introduction to RL terminology, kinds of algorithms, and basic theory,
  • an essay about how to grow into an RL research role,
  • curated list of important papers organized by topic,
  • a well-documented code repo of short, standalone implementations of key algorithms,
  • and a few exercises to serve as warm-ups.

这个模块包含了一系列有用的资源,包含:

  1. 关于RL简短的介绍,一些算法和基础的理论
  2. 关于如何去创建你的RL研究角色的文章
  3. 通过主题组织起来的精选文章清单
  4. 文档化上传的简短的、独立的关键算法的实现
  5. 很少的用来热身的练习

Why We Built This

One of the single most common questions that we hear is

 

If I want to contribute to AI safety, how do I get started?

 

At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. To ensure that AI is safe, we have to come up with safety strategies and algorithms that are compatible with this paradigm. As a result, we encourage everyone who asks this question to study these fields.

 

However, while there are many resources to help people quickly ramp up on deep learning, deep reinforcement learning is more challenging to break into. To begin with, a student of deep RL needs to have some background in math, coding, and regular deep learning. Beyond that, they need both a high-level view of the field—an awareness of what topics are studied in it, why they matter, and what’s been done already—and careful instruction on how to connect algorithm theory to algorithm code.

 

The high-level view is hard to come by because of how new the field is. There is not yet a standard deep RL textbook, so most of the knowledge is locked up in either papers or lecture series, which can take a long time to parse and digest. And learning to implement deep RL algorithms is typically painful, because either

  • the paper that publishes an algorithm omits or inadvertently obscures key design details,
  • or widely-public implementations of an algorithm are hard to read, hiding how the code lines up with the algorithm.

 

While fantastic repos like rllabBaselines, and rllib make it easier for researchers who are already in the field to make progress, they build algorithms into frameworks in ways that involve many non-obvious choices and trade-offs, which makes them hard to learn from. Consequently, the field of deep RL has a pretty high barrier to entry—for new researchers as well as practitioners and hobbyists.

 

So our package here is designed to serve as the missing middle step for people who are excited by deep RL, and would like to learn how to use it or make a contribution, but don’t have a clear sense of what to study or how to transmute algorithms into code. We’ve tried to make this as helpful a launching point as possible.

但是我们没有清楚的指导去学习什么或者怎么样将算法转换成code

 

That said, practitioners aren’t the only people who can (or should) benefit from these materials. Solving AI safety will require people with a wide range of expertise and perspectives, and many relevant professions have no connection to engineering or computer science at all. Nonetheless, everyone involved will need to learn enough about the technology to make informed decisions, and several pieces of Spinning Up address that need.

How This Serves Our Mission

OpenAI’s mission is to ensure the safe development of AGI and the broad distribution of benefits from AI more generally. Teaching tools like Spinning Up help us make progress on both of these objectives.

 

To begin with, we move closer to broad distribution of benefits any time we help people understand what AI is and how it works. This empowers people to think critically about the many issues we anticipate will arise as AI becomes more sophisticated and important in our lives.

 

Also, critically, we need people to help us work on making sure that AGI is safe. This requires a skill set which is currently in short supply because of how new the field is. We know that many people are interested in helping us, but don’t know how—here is what you should study! If you can become an expert on this material, you can make a difference on AI safety.

 

Code Design Philosophy

The algorithm implementations in the Spinning Up repo are designed to be

  • as simple as possible while still being reasonably good,
  • and highly-consistent with each other to expose fundamental similarities between algorithms.

They are almost completely self-contained, with virtually no common code shared between them (except for logging, saving, loading, and MPI utilities), so that an interested person can study each algorithm separately without having to dig through an endless chain of dependencies to see how something is done. The implementations are patterned so that they come as close to pseudocode as possible, to minimize the gap between theory and code.

 

Importantly, they’re all structured similarly, so if you clearly understand one, jumping into the next is painless.

 

We tried to minimize the number of tricks used in each algorithm’s implementation, and minimize the differences between otherwise-similar algorithms. To give some examples of removed tricks: we omit regularization terms present in the original Soft-Actor Critic code, as well as observation normalization from all algorithms. For an example of where we’ve removed differences between algorithms: our implementations of DDPG, TD3, and SAC all follow a convention laid out in the original TD3 code, where all gradient descent updates are performed at the ends of episodes (instead of happening all throughout the episode).

 

All algorithms are “reasonably good” in the sense that they achieve roughly the intended performance, but don’t necessarily match the best reported results in the literature on every task. Consequently, be careful if using any of these implementations for scientific benchmarking comparisons. Details on each implementation’s specific performance level can be found on our benchmarks page.

 

Support Plan

We plan to support Spinning Up to ensure that it serves as a helpful resource for learning about deep reinforcement learning. The exact nature of long-term (multi-year) support for Spinning Up is yet to be determined, but in the short run, we commit to:

 

High-bandwidth support for the first three weeks after release (Nov 8, 2018 to Nov 29, 2018).

  • We’ll move quickly on bug-fixes, question-answering, and modifications to the docs to clear up ambiguities.
  • We’ll work hard to streamline the user experience, in order to make it as easy as possible to self-study with Spinning Up.

Approximately six months after release (in April 2019), we’ll do a serious review of the state of the package based on feedback we receive from the community, and announce any plans for future modification, including a long-term roadmap.

 

Additionally, as discussed in the blog post, we are using Spinning Up in the curriculum for our upcoming cohorts of Scholars and Fellows. Any changes and updates we make for their benefit will immediately become public as well.

Words

  • Spinning Up:旋转
  • Approximately 大约地
  • Cohorts 合伙人
  • Curriculum 课程
  • Ambiguity 双关
  • Roadmap 路线图
  • Convention 惯例
  • Benchmarking 科学基准
  • Virtually 实质的
  • Pseudocode 伪代码
  • Fundamental 基础的
  • Objective 目标
  • Sophisticated 复杂的
  • Anticipate 预知
  • expertise 专业知识
  • distribution 分配
  • broad 广泛地
  •  mission 任务
  • Nonetheless 尽管如此
  • Perspectives 透视,观点
  • Professions 行业
  • Informed decision 明智的决定
  • Terminology 术语
  • Essay  文章
  • Curated 策划的 curated list 精选清单
  • implementations  实现
  • warm-ups. 热身
  • compatible 兼容的
  • paradigm 范例
  • break into 突破
  • ramp up 斜坡向上
  • parse 解析
  • inadvertently 无意识地
  • obscures 羞涩
  • line up with  与什么匹配
  • trade off 权衡
  • practitioner 专业人员
  • transmute 转换
  • launching 启动

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值