【gym】【离散动作空间】【Cart Pole 车杆 】

目录

Cart Pole 车杆

Description 描述

Action Space 动作空间

Observation Space观察空间

Rewards 奖励

Starting State起始状态

Episode End 回合结束

Arguments 参数


Cart Pole 车杆

../../../_images/cart_pole.gif

This environment is part of the Classic Control environments which contains general information about the environment.
此环境是 Classic Control 环境的一部分,其中包含有关环境的一般信息。

Action Space 

动作空间

Discrete(2)

Observation Space 

观察空间

Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)

import 

gymnasium.make("CartPole-v1")

Description 描述

This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.
此环境对应于 Barto、Sutton 和 Anderson 在“可以解决困难学习控制问题的神经元样自适应元件”中描述的车杆问题版本。杆子通过未致动的接头连接到推车上,推车沿着无摩擦的轨道移动。钟摆直立放置在推车上,目标是通过在推车的左右方向施加力来平衡杆子。

Action Space 动作空间

The action is a ndarray with shape (1,) which can take values {0, 1} indicating the direction of the fixed force the cart is pushed with.
该操作是一个形状为 (1,) 的 ndarray,它可以取值 {0, 1},表示推车的固定力的方向。

  • 0: Push cart to the left
    0:将推车向左

  • 1: Push cart to the right
    1:将推车向右

Note: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it
: 施加的力所减小或增加的速度不是固定的,它取决于磁极指向的角度。杆子的重心会改变将推车移动到其下方所需的能量

Observation Space
观察空间

The observation is a ndarray with shape (4,) with the values corresponding to the following positions and velocities:
观察结果是一个形状为 (4,) 的 ndarray,其值对应于以下位置和速度:

Num 数量

Observation 观察

Min 最小值

Max 最大值

0

Cart Position 车位置

-4.8

4.8

1

Cart Velocity 车速度

-Inf

Inf

2

Pole Angle 杆的角度

~ -0.418 rad (-24°) )

~ 0.418 rad (24°) 

3

Pole Angular Velocity 

杆的角速度

-Inf

Inf

Note: While the ranges above denote the possible values for observation space of each element, it is not reflective of the allowed values of the state space in an unterminated episode. Particularly:
注意:虽然上面的范围表示每个元素的观察空间的可能值,但它并不反映未终止回合中状态空间的允许值。特别:

  • The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
    购物车 x 位置(索引 0)可以取 (-4.8, 4.8) 之间的值,但如果车离开 (-2.4, 2.4) 范围,则回合将终止。

  • The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)
    极角可以在 (-.418, .418) 弧度(或 ±24°)之间观察到,但如果极角不在 (-.2095, .2095) (或 ±12°) 范围内,则事件终止

Rewards 奖励

Since the goal is to keep the pole upright for as long as possible, a reward of +1 for every step taken, including the termination step, is allotted. The threshold for rewards is 500 for v1 and 200 for v0.
由于目标是尽可能长时间地保持杆子直立,因此每走一步(包括终止步骤)都会分配 +1 的奖励。奖励阈值为 v1 的 500 和 v0 的 200。

Starting State
起始状态

All observations are assigned a uniformly random value in
所有观测值在(-0.05, 0.05)

Episode End 回合结束

The episode ends if any one of the following occurs:
如果发生以下任一情况,则回合结束:

  1. Termination: Pole Angle is greater than ±12°
    终止:极角大于 ±12°

  2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
    终止:车位置大于 ±2.4(车中心到达显示器边缘)

  3. Truncation: Episode length is greater than 500 (200 for v0)
    截断:回合长度大于 500(v0 为 200)

Arguments 参数

import gymnasium as gym
gym.make('CartPole-v1')

On reset, the options parameter allows the user to change the bounds used to determine the new random state.
在 reset 时,options 参数允许用户更改用于确定新随机状态的边界。

https://gymnasium.farama.org/environments/classic_control/cart_pole/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

资源存储库

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值