目录
Cart Pole 车杆
This environment is part of the Classic Control environments which contains general information about the environment.
此环境是 Classic Control 环境的一部分,其中包含有关环境的一般信息。
|
|
Observation Space 观察空间 |
|
import |
|
Description 描述
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.
此环境对应于 Barto、Sutton 和 Anderson 在“可以解决困难学习控制问题的神经元样自适应元件”中描述的车杆问题版本。杆子通过未致动的接头连接到推车上,推车沿着无摩擦的轨道移动。钟摆直立放置在推车上,目标是通过在推车的左右方向施加力来平衡杆子。
Action Space 动作空间
The action is a
ndarray
with shape(1,)
which can take values{0, 1}
indicating the direction of the fixed force the cart is pushed with.
该操作是一个形状为(1,)
的ndarray
,它可以取值{0, 1}
,表示推车的固定力的方向。
0: Push cart to the left
0:将推车向左1: Push cart to the right
1:将推车向右
Note: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it
注: 施加的力所减小或增加的速度不是固定的,它取决于磁极指向的角度。杆子的重心会改变将推车移动到其下方所需的能量
Observation Space
观察空间
The observation is a
ndarray
with shape(4,)
with the values corresponding to the following positions and velocities:
观察结果是一个形状为(4,)
的ndarray
,其值对应于以下位置和速度:
Num 数量 | Observation 观察 | Min 最小值 | Max 最大值 |
---|---|---|---|
0 | Cart Position 车位置 | -4.8 | 4.8 |
1 | Cart Velocity 车速度 | -Inf | Inf |
2 | Pole Angle 杆的角度 | ~ -0.418 rad (-24°) ) | ~ 0.418 rad (24°) |
3 | Pole Angular Velocity 杆的角速度 | -Inf | Inf |
Note: While the ranges above denote the possible values for observation space of each element, it is not reflective of the allowed values of the state space in an unterminated episode. Particularly:
注意:虽然上面的范围表示每个元素的观察空间的可能值,但它并不反映未终止回合中状态空间的允许值。特别:
-
The cart x-position (index 0) can be take values between
(-4.8, 4.8)
, but the episode terminates if the cart leaves the(-2.4, 2.4)
range.
购物车 x 位置(索引 0)可以取(-4.8, 4.8)
之间的值,但如果车离开(-2.4, 2.4)
范围,则回合将终止。 -
The pole angle can be observed between
(-.418, .418)
radians (or ±24°), but the episode terminates if the pole angle is not in the range(-.2095, .2095)
(or ±12°)
极角可以在(-.418, .418)
弧度(或 ±24°)之间观察到,但如果极角不在(-.2095, .2095)
(或 ±12°) 范围内,则事件终止
Rewards 奖励
Since the goal is to keep the pole upright for as long as possible, a reward of
+1
for every step taken, including the termination step, is allotted. The threshold for rewards is 500 for v1 and 200 for v0.
由于目标是尽可能长时间地保持杆子直立,因此每走一步(包括终止步骤)都会分配+1
的奖励。奖励阈值为 v1 的 500 和 v0 的 200。
Starting State
起始状态
All observations are assigned a uniformly random value in
所有观测值在(-0.05, 0.05)
Episode End 回合结束
The episode ends if any one of the following occurs:
如果发生以下任一情况,则回合结束:
Termination: Pole Angle is greater than ±12°
终止:极角大于 ±12°Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
终止:车位置大于 ±2.4(车中心到达显示器边缘)Truncation: Episode length is greater than 500 (200 for v0)
截断:回合长度大于 500(v0 为 200)
Arguments 参数
import gymnasium as gym gym.make('CartPole-v1')
On reset, the
options
parameter allows the user to change the bounds used to determine the new random state.
在 reset 时,options
参数允许用户更改用于确定新随机状态的边界。
https://gymnasium.farama.org/environments/classic_control/cart_pole/