Mujoco Humanoid环境介绍

Ffffp

已于 2023-07-16 15:49:58 修改

阅读量1.6k

点赞数 1

文章标签：人工智能

于 2023-07-16 13:20:28 首次发布

本文链接：https://blog.csdn.net/qq_56535363/article/details/131747994

版权

Mujoco Humanoid环境介绍

用于记录自己学习的过程，仅供参考，如有错误还请批评指正

Description

该环境基于Tassa、Erez和Todorov在 “Synthesis and stabilization of complex behaviors through online trajectory optimization”中介绍的环境。3D两足机器人是为模拟人类而设计的。它有一个躯干（腹部）和一对腿，一对手臂。每条腿由两个链环组成，手臂也是如此（分别代表膝盖和肘部）。环境的目标是让机器人尽可能快地向前走且不摔倒。

Action Space

动作空间是一个 Box(-1,1,(17,),float32)，动作表示施加在铰链连接处的力矩。
Box数据类型可参考：Python的Gym库中的Box类

	Action	控制最小值	控制最大值	名称 (在相应XML文件中)	Joint	单位
0	在腹部y坐标系中施加在铰链上的扭矩	-0.4	0.4	hip_1 (front_left_leg)	hinge	torque (N m)
1	在腹部z坐标系中施加在铰链上的扭矩	-0.4	0.4	angle_1 (front_left_leg)	hinge	torque (N m)
2	在腹部x坐标系中施加在铰链上的扭矩	-0.4	0.4	hip_2 (front_right_leg)	hinge	torque (N m)
3	施加在躯干/腹部和右髋关节之间的转子上的扭矩（x坐标）	-0.4	0.4	right_hip_x (right_thigh)	hinge	torque (N m)
4	施加在躯干/腹部和右髋关节之间的转子上的扭矩（z坐标）	-0.4	0.4	right_hip_z (right_thigh)	hinge	torque (N m)
5	施加在躯干/腹部和右髋关节之间的转子上的扭矩（y坐标）	-0.4	0.4	right_hip_y (right_thigh)	hinge	torque (N m)
6	施加在右髋/大腿和右胫骨之间的转子上的扭矩	-0.4	0.4	right_knee	hinge	torque (N m)
7	施加在躯干/腹部和左髋关节之间的转子上的扭矩（x坐标）	-0.4	0.4	left_hip_x (left_thigh)	hinge	torque (N m)
8	施加在躯干/腹部和左髋关节之间的转子上的扭矩（z坐标）	-0.4	0.4	left_hip_z (left_thigh)	hinge	torque (N m)
9	施加在躯干/腹部和左髋关节之间的转子上的扭矩（y坐标）	-0.4	0.4	left_hip_y (left_thigh)	hinge	torque (N m)
10	施加在左髋/大腿和左胫骨之间的转子上的扭矩	-0.4	0.4	left_knee	hinge	torque (N m)
11	施加在躯干和右上臂之间的转子上的扭矩（坐标-1）	-0.4	0.4	right_shoulder1	hinge	torque (N m)
12	施加在躯干和右上臂之间的转子上的扭矩（坐标-2）	-0.4	0.4	right_shoulder2	hinge	torque (N m)
13	施加在右上臂和右下臂之间的转子上的扭矩	-0.4	0.4	right_elbow	hinge	torque (N m)
14	施加在躯干和左上臂之间的转子上的扭矩（坐标-1）	-0.4	0.4	left_shoulder1	hinge	torque (N m)
15	施加在躯干和左上臂之间的转子上的扭矩（坐标-2）	-0.4	0.4	left_shoulder2	hinge	torque (N m)
16	施加在左上臂和左下臂之间的转子上的扭矩	-0.4	0.4	left_elbow	hinge	torque (N m)

Observation Space

观测由机器人的不同身体部位的位置值组成，然后是这些单独部位的速度（其导数），所有位置排在所有速度之前。
默认情况下，观测不包括躯干的x坐标和y坐标。可通过exclude_current_positions_from_observation=False来得到。在这种情况下，观测空间将具有378个维度，其中前两个维度表示躯干的x坐标和y坐标。无论exclude_current_positions_from_observation设置为true还是false，x和y坐标都将分别以x_position和y_position在info中返回。
默认情况下，观测是具有形状（376，）的ndarray，其中元素对应于以下内容：

	Observation	Min	Max	Name (in corresponding XML file)	Joint	Unit
0	躯干的z坐标（中心）	-Inf	Inf	root	free	position (m)
1	躯干的x坐标（中心）	-Inf	Inf	root	free	angle (rad)
2	躯干的y坐标（中心）	-Inf	Inf	root	free	angle (rad)
3	躯干的z方向（中心）	-Inf	Inf	root	free	angle (rad)
4	躯干的w方向（中心）	-Inf	Inf	root	free	angle (rad)
5	腹部的z角（在下腰部）	-Inf	Inf	abdomen_z	hinge	angle (rad)
6	腹部的y角（在下腰部）	-Inf	Inf	abdomen_y	hinge	angle (rad)
7	腹部x角（骨盆内）	-Inf	Inf	abdomen_x	hinge	angle (rad)
8	骨盆和右髋关节之间角度的x坐标（在右大腿）	-Inf	Inf	right_hip_x	hinge	angle (rad)
9	骨盆和右髋关节之间角度的z坐标（在右大腿）	-Inf	Inf	right_hip_z	hinge	angle (rad)
10	骨盆和右髋关节之间角度的y坐标（在右大腿）	-Inf	Inf	right_hip_y	hinge	angle (rad)
11	右髋和右胫骨之间的角度（右膝）	-Inf	Inf	right_knee	hinge	angle (rad)
12	骨盆和左髋关节之间角度的x坐标（在左大腿）	-Inf	Inf	left_hip_x	hinge	angle (rad)
13	骨盆和左髋关节之间角度的z坐标（在左大腿）	-Inf	Inf	left_hip_z	hinge	angle (rad)
14	骨盆和左髋关节之间角度的y坐标（在左大腿）	-Inf	Inf	left_hip_y	hinge	angle (rad)
15	左髋和左胫骨之间的角度（左膝）	-Inf	Inf	left_knee	hinge	angle (rad)
16	坐标-1（多轴）躯干和右臂之间的角度（右上臂）	-Inf	Inf	right_shoulder1	hinge	angle (rad)
17	坐标-2（多轴）躯干和右臂之间的角度（右上臂）	-Inf	Inf	right_shoulder2	hinge	angle (rad)
18	右上臂与右下臂夹角	-Inf	Inf	right_elbow	hinge	angle (rad)
19	坐标-1（多轴）躯干和右臂之间的角度（左上臂）	-Inf	Inf	left_shoulder1	hinge	angle (rad)
20	坐标-2（多轴）躯干和右臂之间的角度（左上臂）	-Inf	Inf	left_shoulder2	hinge	angle (rad)
21	左上臂与左下臂夹角	-Inf	Inf	left_elbow	hinge	angle (rad)
22	躯干的x坐标速度（中心）	-Inf	Inf	root	free	velocity (m/s)
23	躯干的y坐标速度（中心）	-Inf	Inf	root	free	velocity (m/s)
24	躯干的z坐标速度（中心）	-Inf	Inf	root	free	velocity (m/s)
25	躯干的x坐标角速度（中心）	-Inf	Inf	root	free	anglular velocity (rad/s)
26	躯干的y坐标角速度（中心）	-Inf	Inf	root	free	anglular velocity (rad/s)
27	躯干的z坐标角速度（中心）	-Inf	Inf	root	free	anglular velocity (rad/s)
28	腹部角速度的z坐标（在下腰部）	-Inf	Inf	abdomen_z	hinge	anglular velocity (rad/s)
29	腹部角速度的y坐标（在下腰部）	-Inf	Inf	abdomen_y	hinge	anglular velocity (rad/s)
30	腹部角速度的x坐标（在骨盆中）	-Inf	Inf	abdomen_x	hinge	aanglular velocity (rad/s)
31	骨盆和右髋关节之间夹角的角速度的x坐标（右大腿）	-Inf	Inf	right_hip_x	hinge	anglular velocity (rad/s)
32	骨盆和右髋关节之间夹角的角速度的z坐标（右大腿）	-Inf	Inf	right_hip_z	hinge	anglular velocity (rad/s)
33	骨盆和右髋关节之间夹角的角速度的y坐标（右大腿）	-Inf	Inf	right_hip_y	hinge	anglular velocity (rad/s)
34	右髋关节和右胫骨之间夹角的角速度（右膝）	-Inf	Inf	right_knee	hinge	anglular velocity (rad/s)
35	骨盆和左髋关节之间角度的角速度的x坐标（左大腿）	-Inf	Inf	left_hip_x	hinge	anglular velocity (rad/s)
36	骨盆和左髋关节之间角度的角速度的z坐标（左大腿）	-Inf	Inf	left_hip_z	hinge	anglular velocity (rad/s)
37	骨盆和左髋关节之间角度的角速度的y坐标（左大腿）	-Inf	Inf	left_hip_y	hinge	anglular velocity (rad/s)
38	左髋关节和左胫骨之间夹角的角速度（在左膝）	-Inf	Inf	left_knee	hinge	anglular velocity (rad/s)
39	躯干和右臂之间角度的角速度坐标-1（多轴）（在右上臂）	-Inf	Inf	right_shoulder1	hinge	anglular velocity (rad/s)
40	躯干和右臂之间角度的角速度坐标-2（多轴）（在右上臂）	-Inf	Inf	right_shoulder2	hinge	anglular velocity (rad/s)
41	右上臂与右下臂夹角的角速度	-Inf	Inf	right_elbow	hinge	anglular velocity (rad/s)
42	躯干和左臂之间角速度的坐标-1（多轴）（左上臂）	-Inf	Inf	left_shoulder1	hinge	anglular velocity (rad/s)
43	躯干和左臂之间角速度的坐标-1（多轴）（左上臂）	-Inf	Inf	left_shoulder2	hinge	anglular velocity (rad/s)
44	左上臂与左下臂夹角的角速度	-Inf	Inf	left_elbow	hinge	anglular velocity (rad/s)

此外，在表中所有基于位置和速度的值之后，还包含（按顺序）：

cinert：单个刚体相对于质心的质量和惯性（这是过渡的中间结果）。它的形状为14*10（nbody * 10），因此在状态空间中增加了另外140个元素。

cvel：基于质心的速度。它的形状为14*6（nbody * 6），因此在状态空间中添加了另外84个元素。

qfrc_actuator：作为致动器力生成的约束力。这具有形状（23，）（nv * 1），因此将另外23个元素添加到状态空间。
cfrc_ext：这是物体上基于质心的外力。它的形状为14*6（nbody * 6），因此在状态空间中增加了另外84个元素。其中nbody代表机器人中躯干的数量，nv代表自由度数量（= dim(qvel)）

（x，y，z）坐标是平移自由度，而方向是用四元数表示的旋转自由度。可以在Mujoco文档上阅读更多关于自由关节的信息。

注意：Humanoid-v4环境不再存在以下接触力问题。如果使用v4以前的Humanoid版本，则有报告称，使用Mujoco-Py版本>2.0会导致接触力始终为0。因此，如果您想报告接触力的结果，我们建议在使用Humanoid环境时使用Mujoco Py版本<2.0（如果在实验中未使用接触力，则可以使用版本>2.0）。

Reward

奖励由四部分组成：
healthy_reward：机器人的每一步（定义见“Episode Termination”），都会获得固定值的healthy_reward奖励。

forward_reward：向前行走的奖励，计算方式为forward_reward_weight * （动作前的平均质心-动作后的平均质心）/ dt 。dt是动作之间的时间，取决于frame_skip参数（默认值为5），其中帧时间为0.003，使默认值dt=5*0.003=0.015。如果机器人向前走（在正x方向上），这个奖励将是正的。质心的计算在Humanoid的.py文件中定义。

ctrl_cost：如果人形机器人的控制力太大，则对其进行惩罚的负面奖励。如果有nu个致动器/控制器，则控制器的形状为nu x 1。它的计算方法为为ctrl_cost_weight * sum（control²`）。

contact_cost：如果外部接触力太大，惩罚机器人的负面奖励。它是通过将contact_cost_weight * sum（外部接触力²）clip到contact_cost_range指定的间隔来计算的。

返回的总奖励为 rewrard = healthy_reward + forward_reward - ctrl_cost - contact_cost，此外，info还将包含个人奖励条款

Starting State

所有观测从状态（0.0、0.0、1.4、1.0、0.0 … 0.0）开始，将[-reset_noise_scale，reset_noise _scale]范围内的均匀噪声添加到位置和速度值（表中的值）中，以获得随机性。注意，初始z坐标被有意选择为高，从而表示出站立的人形。初始方向设计为使其面向前方。

Episode End

如果躯干的z位置不再包含在参数healthy_z_range指定的闭合区间中，那么机器人就被认为是unhealthy的。
如果在运行过程中terminate_hen_unhealthy=True（这是默认值），则当发生以下任何情况时，该事件结束：
1.Truncation：episode持续时间达到1000 timesteps
2.Termination：机器人unhealthy
如果terminate_hen_unhealthy=False，则只有当超过1000个时间步长时，该episode才会结束

参数

Parameter	Type	默认值	Description
`xml_file`	str	`"humanoid.xml"`	Path to a MuJoCo model
`forward_reward_weight`	float	`1.25`	Weight for forward_reward term (see section on reward)
`ctrl_cost_weight`	float	`0.1`	Weight for ctrl_cost term (see section on reward)
`contact_cost_weight`	float	`5e-7`	Weight for contact_cost term (see section on reward)
`healthy_reward`	float	`5.0`	Constant reward given if the humanoid is “healthy” after timestep
`terminate_when_unhealthy`	bool	`True`	If true, issue a done signal if the z-coordinate of the torso is no longer in the `healthy_z_range`
`healthy_z_range`	tuple	`(1.0, 2.0)`	The humanoid is considered healthy if the z-coordinate of the torso is in this range
`reset_noise_scale`	float	`1e-2`	Scale of random perturbations of initial position and velocity (see section on Starting State)
`exclude_current_positions_from_observation`	bool	`True`	Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies

参考链接: Gymlibrary