gcan pytorch实现_PyTorch实现的强化学习算法集

RLkit是一个基于PyTorch的强化学习框架,实现了包括RIG、HER、SAC、TD3等算法。它包含了一系列改进,如渲染、在线算法模式,并提供了方便的安装和使用方式。RLkit还支持doodad进行分布式实验,以及通过run_policy.py可视化策略。
摘要由CSDN通过智能技术生成

RLkit

Reinforcement learning framework and algorithms implemented in PyTorch.

Implemented algorithms:

Reinforcement Learning with Imagined Goals (RIG)

Special case of Skew-Fit: set power = 0

Temporal Difference Models (TDMs)

Only implemented in v0.1.2-. See Legacy Documentation section below.

Hindsight Experience Replay (HER)

Soft Actor Critic (SAC)

Includes the "min of Q" method, the entropy-constrained implementation, reparameterization trick, and numerical tanh-Normal Jacbian calcuation.

Twin Delayed Deep Determinstic Policy Gradient (TD3)

To get started, checkout the example scripts, linked above.

What's New

Version 0.2

04/25/2019

Use new multiworld code that requires explicit environment registration.

Make installation easier by adding setup.py and using default conf.py.

04/16/2019

Log how many train steps were called

Log env_info and agent_info.

04/05/2019-04/15/2019

Add rendering

Fix SAC bug to account for future entropy (#41, #43)

Add online algorithm mode (#42)

04/05/2019

The initial release for 0.2 has the following major changes:

Remove Serializable class and use default pickle scheme.

Remove PyTorchModule class and use native torch.nn.Module directly.

Switch to batch-style training rather than online training.

Makes code more amenable to parallelization.

Implementing the online-version is straightforward.

Refactor training code to be its own object, rather than being integrated inside of RLAlgorithm.

Refactor sampling code to be its own object, rather tha

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值