【RL从入门到放弃】【二十七】【google dopamine】

Dopamine是Google推出的一个基于Tensorflow的强化学习框架，它提供了灵活性、稳定性和可重复性，特别适合于快速原型设计和实验。该框架包含DQN、C51等算法的实现，以及一系列用于实验展示和数据分析的工具。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

除了OpenAI，还有其他巨头在做强化学习的工作（尤其是深度强化学习），这里介绍Google和Facebook两家开源的工程，分别是Dopamine和Horizon，Github上地址分别为：https://github.com/google/dopamine 和 https://github.com/facebookresearch/Horizon

Dopamine is a research framework for fast prototyping【原型】 of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative【投机的】 research).

Our design principles are:

Easy experimentation: Make it easy for new users to run benchmark experiments.【这个和benchmark有什么关系呢？】
Flexible【灵活的】 development: Make it easy for new users to try out research ideas.
Compact【紧凑】 and reliable: Provide implementations【实现】 for a few, battle-tested【实战测试】 algorithms.
Reproducible【可重复的】: Facilitate【促进】 reproducibility【再生性】 in results. In particular, our setup follows the recommendations【建议】 given by Machado et al. (2018).

In the spirit of these principles, this first version focuses on supporting the state-of-the-art, single-GPU Rainbow agent (Hessel et al., 2018) applied to Atari 2600 game-playing (Bellemare et al., 2013). Specifically, our Rainbow agent implements the three components identified【确定】 as most important by Hessel et al.:

n-step Bellman updates (see e.g. Mnih et al., 2016)
Prioritized experience replay (Schaul et al., 2015)
Distributional【分布式】 reinforcement learning (C51; Bellemare et al., 2017

For completeness, we also provide an implementation of DQN (Mnih et al., 2015). For additional details, please see ourdocumentation.

We provide a set of Colaboratory notebooks which demonstrate【演示】 how to use Dopamine.

基本信息介绍

现有的 RL 框架并没有结合灵活性和稳定性以及使研究人员能够有效地迭代 RL 方法，并因此探索可能没有直接明显益处的新研究方向。所以Google推出一个基于 Tensorflow 的框架，旨在为 RL 的研究人员提供灵活性、稳定性和可重复性。此版本还包括一组阐明如何使用整个框架的 colabs。

精简的代码（大约 15 个Python 文件）。通过专注于 Arcade 学习环境（一个成熟的，易于理解的基准）和四个基于 value 的智能体来实现的

DQN：DeepMind 的深度 Q 网络，核心就是强化学习
C51
一个精心策划的 Rainbow 智能体的简化版本
隐式分位数网络（Implicit Quantile Network）智能体
对于新的研究人员来说，能够根据既定方法快速对其想法进行基准测试非常重要。因此，我们为 Arcade 学习环境支持的 60 个游戏提供四个智能体的完整培训数据，可用作 Python pickle 文件（用于使用我们框架训练的智能体）和 JSON 数据文件（用于与受过其他框架训练的智能体进行比较）；我们还提供了一个网站，你可以在其中快速查看 60 个游戏中所有智能体的训练运行情况。
首先我在ubutun上遇到了安装这两个安装包无法安装上的问题:

sudo apt-get install cmake zlib1g-dev

这个主要是由于权限的问题:

python tests/atari_init_test.py 这个主要是使用来测试是否安装成功的

(easytensor27) chen@chen-ThinkStation-D30:~/dopamine$ export PYTHONPATH=${PYTHONPATH}:.
(easytensor27) chen@chen-ThinkStation-D30:~/dopamine$ python tests/atari_init_test.py
2018-09-06 14:51:45.927207: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-09-06 14:51:46.254906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Quadro K5200 major: 3 minor: 5 memoryClockRate(GHz): 0.771
pciBusID: 0000:05:00.0
totalMemory: 7.43GiB freeMemory: 6.93GiB
2018-09-06 14:51:46.254964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-09-06 14:51:53.170582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 14:51:53.170646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-09-06 14:51:53.170661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-09-06 14:51:53.171022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6707 MB memory) -> physical GPU (device: 0, name: Quadro K5200, pci bus id: 0000:05:00.0, compute capability: 3.5)
I0906 14:51:53.344151 140619312887552 tf_logging.py:115] Creating DQNAgent agent with the following parameters:
I0906 14:51:53.344964 140619312887552 tf_logging.py:115] 	 gamma: 0.990000
I0906 14:51:53.345103 140619312887552 tf_logging.py:115] 	 update_horizon: 1.000000
I0906 14:51:53.345235 140619312887552 tf_logging.py:115] 	 min_replay_history: 20000
I0906 14:51:53.345361 140619312887552 tf_logging.py:115] 	 update_period: 4
I0906 14:51:53.345479 140619312887552 tf_logging.py:115] 	 target_update_period: 8000
I0906 14:51:53.345597 140619312887552 tf_logging.py:115] 	 epsilon_train: 0.010000
I0906 14:51:53.345712 140619312887552 tf_logging.py:115] 	 epsilon_eval: 0.001000
I0906 14:51:53.345829 140619312887552 tf_logging.py:115] 	 epsilon_decay_period: 250000
I0906 14:51:53.345943 140619312887552 tf_logging.py:115] 	 tf_device: /gpu:0
I0906 14:51:53.346056 140619312887552 tf_logging.py:115] 	 use_staging: True
I0906 14:51:53.346170 140619312887552 tf_logging.py:115] 	 optimizer: <tensorflow.python.training.rmsprop.RMSPropOptimizer object at 0x7fe3d692cb10>
I0906 14:51:53.359112 140619312887552 tf_logging.py:115] Creating a OutOfGraphReplayBuffer replay memory with the following parameters:
I0906 14:51:53.359287 140619312887552 tf_logging.py:115] 	 observation_shape: 84
I0906 14:51:53.359469 140619312887552 tf_logging.py:115] 	 stack_size: 4
I0906 14:51:53.359643 140619312887552 tf_logging.py:115] 	 replay_capacity: 100
I0906 14:51:53.359824 140619312887552 tf_logging.py:115] 	 batch_size: 32
I0906 14:51:53.359994 140619312887552 tf_logging.py:115] 	 update_horizon: 1
I0906 14:51:53.360157 140619312887552 tf_logging.py:115] 	 gamma: 0.990000
I0906 14:51:55.830522 140619312887552 tf_logging.py:115] Beginning training...
W0906 14:51:55.830992 140619312887552 tf_logging.py:125] num_iterations (0) < start_iteration(0)
..
----------------------------------------------------------------------
Ran 2 tests in 10.286s

还是慢慢转移到windows上,因为很多开发对windows都是不支持的,或者说支持性很不好.

pip install absl-py atari-py gin-config gym opencv-python

sudo apt-get install python-opencv

标准的 Atari 2600 实验是

python -um dopamine.discrete_domains.train \
  --base_dir=/tmp/dopamine \
  --gin_files='dopamine/agents/dqn/configs/dqn.gin'

可以调整dopamine/agents/dqn/configs/dqn.gin中的文件来设置运行参数

那感觉他就是在研究算法,而我们更加关心的是如何落地的问题.而且最重要的一点是他没有图形化的界面,实际上不是很直观.

在ubutun下面安装和启动notebooke

sudo pip install jupyter

启动的命令是:jyputer notebook

Google自己有一个Colabs项目，是一个在云端运行的免费的 Jupyter 笔记本环境，不需要进行任何设置就可以使用。Dopamine的colab文件夹下，有很多文件可供参考：
utils.py包含了进行实验展示的一系列方法。
agents.ipyn介绍了如何创建DQN agent或者新建一个agent。
load_statistics.ipynb 介绍了对统计数据进行加载和分析。
tensorboard.ipynb介绍了使用tensorboard进行图形化。
cartpole.ipynb介绍了使用DQN和C51对Cartpole进行训练的例子。
虽然下面这个例子没办法执行,但是能看出google提供的这个框架大致能让我们作什么: