[RL robotic 环境] - [Robosuite](2)

Abstract

本文主要解析 Robosuite中 给定环境Stack的解读, 方便后续 自定义环境。

要点

  1. 机器人,工作台,物体的搭建
  2. 如何能够自定义 物体的大小和位置(包括角度)的初始化
  3. 物体的目标位置 放置 虚拟(无实体)的物体 [本文未涉及]
  4. 如何建立obs信息
  5. 如何reset环境
  6. 设置奖励和终止条件

依赖函数|类

from collections import OrderedDict

import numpy as np

from robosuite.environments.manipulation.single_arm_env import SingleArmEnv
from robosuite.models.arenas import TableArena
from robosuite.models.objects import BoxObject
from robosuite.models.tasks import ManipulationTask
from robosuite.utils.mjcf_utils import CustomMaterial
from robosuite.utils.observables import Observable, sensor
from robosuite.utils.placement_samplers import UniformRandomSampler
from robosuite.utils.transform_utils import convert_quat
变量名函数/类用处
SingleArmEnv为单机械臂准备的类,单臂的环境需要继承
TableArena用于定义desk,也就是定义工作空间
BoxObject定义物体,比如cube,sphere
ManipulationTask定义任务的类,包含robot,table,object三部分,object可以为None
CustomMaterial用于定义object的材料
Observable用于定义观测量,观测量包含name,sensor_func。 name是观测值在dict中的name,sensor_func定义了如何从sim.data/obs_cache获取该sensor数据
sensor函数修饰器函数,用于包装sensor_func
UniformRandomSampler用于管理object的初始化方式,用于uniform生成object的位置和角度
convert_quat函数用于转换quat的顺序

类 初始化

class Stack(SingleArmEnv):
    """
    This class corresponds to the stacking task for a single robot arm.
    Args:
        robots (str or list of str): Specification for specific robot arm(s) to be instantiated within this env
            (e.g: "Sawyer" would generate one arm; ["Panda", "Panda", "Sawyer"] would generate three robot arms)
            Note: Must be a single single-arm robot!
        env_configuration (str): Specifies how to position the robots within the environment (default is "default").
            For most single arm environments, this argument has no impact on the robot setup.
        controller_configs (str or list of dict): If set, contains relevant controller parameters for creating a
            custom controller. Else, uses the default controller for this specific task. Should either be single
            dict if same controller is to be used for all robots or else it should be a list of the same length as
            "robots" param
        gripper_types (str or list of str): type of gripper, used to instantiate
            gripper models from gripper factory. Default is "default", which is the default grippers(s) associated
            with the robot(s) the 'robots' specification. None removes the gripper, and any other (valid) model
            overrides the default gripper. Should either be single str if same gripper type is to be used for all
            robots or else it should be a list of the same length as "robots" param
        initialization_noise (dict or list of dict): Dict containing the initialization noise parameters.
            The expected keys and corresponding value types are specified below:
            :`'magnitude'`: The scale factor of uni-variate random noise applied to each of a robot's given initial
                joint positions. Setting this value to `None` or 0.0 results in no noise being applied.
                If "gaussian" type of noise is applied then this magnitude scales the standard deviation applied,
                If "uniform" type of noise is applied then this magnitude sets the bounds of the sampling range
            :`'type'`: Type of noise to apply. Can either specify "gaussian" or "uniform"
            Should either be single dict if same noise value is to be used for all robots or else it should be a
            list of the same length as "robots" param
            :Note: Specifying "default" will automatically use the default noise settings.
                Specifying None will automatically create the required dict with "magnitude" set to 0.0.
        table_full_size (3-tuple): x, y, and z dimensions of the table.
        table_friction (3-tuple): the three mujoco friction parameters for
            the table.
        use_camera_obs (bool): if True, every observation includes rendered image(s)
        use_object_obs (bool): if True, include object (cube) information in
            the observation.
        reward_scale (None or float): Scales the normalized reward function by the amount specified.
            If None, environment reward remains unnormalized
        reward_shaping (bool): if True, use dense rewards.
        placement_initializer (ObjectPositionSampler): if provided, will
            be used to place objects on every reset, else a UniformRandomSampler
            is used by default.
        has_renderer (bool): If true, render the simulation state in
            a viewer instead of headless mode.
        has_offscreen_renderer (bool): True if using off-screen rendering
        render_camera (str): Name of camera to render if `has_renderer` is True. Setting this value to 'None'
            will result in the default angle being applied, which is useful as it can be dragged / panned by
            the user using the mouse
        render_collision_mesh (bool): True if rendering collision meshes in camera. False otherwise.
        render_visual_mesh (bool): True if rendering visual meshes in camera. False otherwise.
        render_gpu_device_id (int): corresponds to the GPU device id to use for offscreen rendering.
            Defaults to -1, in which case the device will be inferred from environment variables
            (GPUS or CUDA_VISIBLE_DEVICES).
        control_freq (float): how many control signals to receive in every second. This sets the amount of
            simulation time that passes between every action input.
        horizon (int): Every episode lasts for exactly @horizon timesteps.
        ignore_done (bool): True if never terminating the environment (ignore @horizon).
        hard_reset (bool): If True, re-loads model, sim, and render object upon a reset call, else,
            only calls sim.reset and resets all robosuite-internal variables
        camera_names (str or list of str): name of camera to be rendered. Should either be single str if
            same name is to be used for all cameras' rendering or else it should be a list of cameras to render.
            :Note: At least one camera must be specified if @use_camera_obs is True.
            :Note: To render all robots' cameras of a certain type (e.g.: "robotview" or "eye_in_hand"), use the
                convention "all-{name}" (e.g.: "all-robotview") to automatically render all camera images from each
                robot's camera list).
        camera_heights (int or list of int): height of camera frame. Should either be single int if
            same height is to be used for all cameras' frames or else it should be a list of the same length as
            "camera names" param.
        camera_widths (int or list of int): width of camera frame. Should either be single int if
            same width is to be used for all cameras' frames or else it should be a list of the same length as
            "camera names" param.
        camera_depths (bool or list of bool): True if rendering RGB-D, and RGB otherwise. Should either be single
            bool if same depth setting is to be used for all cameras or else it should be a list of the same length as
            "camera names" param.
        camera_segmentations (None or str or list of str or list of list of str): Camera segmentation(s) to use
            for each camera. Valid options are:
                `None`: no segmentation sensor used
                `'instance'`: segmentation at the class-instance level
                `'class'`: segmentation at the class level
                `'element'`: segmentation at the per-geom level
            If not None, multiple types of segmentations can be specified. A [list of str / str or None] specifies
            [multiple / a single] segmentation(s) to use for all cameras. A list of list of str specifies per-camera
            segmentation setting(s) to use.
    Raises:
        AssertionError: [Invalid number of robots specified]
    """
变量名变量类型作用
robotsstr上一篇中的单臂机器人:‘Sawyer’, ‘Panda’, ‘Jaco’, ‘Kinova3’, ‘IIWA’, ‘UR5e’
env_configuration“str”机器人的config,好像不起作用,一般默认
controller_configsstr or list of dict‘JOINT_VELOCITY’, ‘JOINT_TORQUE’, ‘JOINT_POSITION’, ‘OSC_POSITION’, ‘OSC_POSE’, ‘IK_POSE’
gripper_typesstr or list of dict见上一篇
initialization_noisedict or list of dict用于随机化机器人的初始化位置, “‘magnitude’”:None/0.0/范围/方差, ‘type’:“uniform”;“gaussian”
table_full_size3-tuple(x,y,z)
table_friction3-tuple摩擦系数
use_camera_obsboolobs中是否有渲染的图片
use_object_obsboolobs中是否包含object的信息
reward_scalefloat/Nooe对reward进行scale
reward_shapingbool是否使用dense reward
placement_initializerclass用于在每一次reset中初始化object的放置
has_rendererbool是否在viewer中渲染simulation
has_offscreen_rendererbool是否进行off-screen渲染
render_camerastr渲染的相机名字
render_collision_meshbool是否渲染碰撞
render_visual_meshbool是否渲染只可见,无实体的object
render_gpu_device_idint使用的gpu device id
control_freqfloat每秒钟控制的次数
horizonint每个episode持续的步长
ignore_donebool是否忽略terminating
hard_resetboolTrue的话 每次都会重新建立Model,sim,render等等; False的话 只会reset
camera_names, camera_heights, camera_widths,camera_depths,camera_segmentationscamera相关
        # settings for table top
        self.table_full_size = table_full_size
        self.table_friction = table_friction
        self.table_offset = np.array((0, 0, 0.8))

        # reward configuration
        self.reward_scale = reward_scale
        self.reward_shaping = reward_shaping

        # whether to use ground-truth object states
        self.use_object_obs = use_object_obs

        # object placement initializer
        self.placement_initializer = placement_initializer

模型建立

环境在 reset使用之前 需要进行

  1. _load_model (创建好robot,desktop,objects,形成self.model);
  2. _setup_references(设置对象在mujoco中对应的id,用于访问sim.data中的数据);
  3. _setup_observables (用于建立obs产生的映射函数,从sim.data映射成 obs dict);
    然后 _reset_internal 完成 env.reset中 model的reset

_load_model

  1. 设置robot and table

super()._load_model()

# -------------robot settings---------------
# Adjust base pose accordingly
xpos = self.robots[0].robot_model.base_xpos_offset["table"](self.table_full_size[0])
# 将robot的pos进行移动
self.robots[0].robot_model.set_base_xpos(xpos)

# -------------table workspace--------------
# load model for table top workspace
mujoco_arena = TableArena(
    table_full_size=self.table_full_size,
    table_friction=self.table_friction,
    table_offset=self.table_offset,
)

# Arena always gets set to zero origin
mujoco_arena.set_origin([0, 0, 0])

  1. 设置objects
# --------cube可选的texture--------------
texture = ["WoodRed","WoodGreen","WoodBlue","WoodLight","WoodDark","WoodTiles","WoodPanels"]

# initialize objects of interest
# 没有细看定义
tex_attrib = {
    "type": "cube",
}
mat_attrib = {
    "texrepeat": "1 1",
    "specular": "0.4",
    "shininess": "0.1",
}
redwood = CustomMaterial(
    texture="WoodRed",
    tex_name="redwood",
    mat_name="redwood_mat",
    tex_attrib=tex_attrib,
    mat_attrib=mat_attrib,
)
greenwood = CustomMaterial(
    texture="WoodGreen",
    tex_name="greenwood",
    mat_name="greenwood_mat",
    tex_attrib=tex_attrib,
    mat_attrib=mat_attrib,
)

#size 定义时, 如何只生成 正方体? 
self.cubeA = BoxObject(
    name="cubeA",
    size_min=[0.02, 0.02, 0.02],
    size_max=[0.02, 0.02, 0.02],
    rgba=[1, 0, 0, 1],  # 设置颜色和透明度
    material=redwood,
)

self.cubeB = BoxObject(
    name="cubeB",
    size_min=[0.025, 0.025, 0.025],
    size_max=[0.025, 0.025, 0.025],
    rgba=[0, 1, 0, 1],
    material=greenwood,
)
cubes = [self.cubeA, self.cubeB]

# 将 objects 加入到 placement initializer 中
# Create placement initializer
if self.placement_initializer is not None:
    self.placement_initializer.reset()
    self.placement_initializer.add_objects(cubes)
else:
    self.placement_initializer = UniformRandomSampler(
        name="ObjectSampler",
        mujoco_objects=cubes,
        x_range=[-0.08, 0.08],
        y_range=[-0.08, 0.08],
        rotation=None,
        ensure_object_boundary_in_range=False,
        ensure_valid_placement=True,
        reference_pos=self.table_offset,
        z_offset=0.01,
    )
  1. 建立model
# task includes arena, robot, and objects of interest
# self.model 会生成对应的 MJCF文件

self.model = ManipulationTask(
    mujoco_arena=mujoco_arena,
    mujoco_robots=[robot.robot_model for robot in self.robots],
    mujoco_objects=cubes,
)

_setup_references

设置一些重要object的id (mujoco中的id),方便后续从 sim.data中完成状态信息的调用

"""
Sets up references to important components. A reference is typically an
index or a list of indices that point to the corresponding elements
in a flatten array, which is how MuJoCo stores physical simulation data.
"""
# 设置 robot的相关信息
super()._setup_references()

# Additional object references from this env
# 设置 object对应的相关reference, 可以在后续_setup_observables 中调用
self.cubeA_body_id = self.sim.model.body_name2id(self.cubeA.root_body)
self.cubeB_body_id = self.sim.model.body_name2id(self.cubeB.root_body)

_setup_observables

observables = super()._setup_observables()  

得到含有robot信息的observables (dict类型); 将 object需要查看的信息根据sensor和observable的方式加入到 observables中, observables 就是 obs对应的 dict类型。


        # low-level object information
        if self.use_object_obs:
            # Get robot prefix and define observables modality
            pf = self.robots[0].robot_model.naming_prefix
            modality = "object"

self.use_object_obs决定了是否要输出objects对应的信息
pf = self.robots[0].robot_model.naming_prefix 说明了robot信息的前缀
f"{pf}eef_pos"可以完成 robot的 end effector位置的调用; 其他信息格式相同。


            # position and rotation of the first cube
            @sensor(modality=modality)
            def cubeA_pos(obs_cache):
                return np.array(self.sim.data.body_xpos[self.cubeA_body_id])

            @sensor(modality=modality)
            def cubeA_quat(obs_cache):
                return convert_quat(np.array(self.sim.data.body_xquat[self.cubeA_body_id]), to="xyzw")

            @sensor(modality=modality)
            def cubeB_pos(obs_cache):
                return np.array(self.sim.data.body_xpos[self.cubeB_body_id])

            @sensor(modality=modality)
            def cubeB_quat(obs_cache):
                return convert_quat(np.array(self.sim.data.body_xquat[self.cubeB_body_id]), to="xyzw")

            @sensor(modality=modality)
            def gripper_to_cubeA(obs_cache):
                return (
                    obs_cache["cubeA_pos"] - obs_cache[f"{pf}eef_pos"]
                    if "cubeA_pos" in obs_cache and f"{pf}eef_pos" in obs_cache
                    else np.zeros(3)
                )

            @sensor(modality=modality)
            def gripper_to_cubeB(obs_cache):
                return (
                    obs_cache["cubeB_pos"] - obs_cache[f"{pf}eef_pos"]
                    if "cubeB_pos" in obs_cache and f"{pf}eef_pos" in obs_cache
                    else np.zeros(3)
                )

            @sensor(modality=modality)
            def cubeA_to_cubeB(obs_cache):
                return (
                    obs_cache["cubeB_pos"] - obs_cache["cubeA_pos"]
                    if "cubeA_pos" in obs_cache and "cubeB_pos" in obs_cache
                    else np.zeros(3)
                )

@sensor(modality=modality) 作为 function的修饰器, 对于每一个 sensor function 都需要sensor来修饰, 然后 sensor function 会传入到 observable中, 完成信息的调用。[obs_cache的调用是有顺序的,默认包含robot对应的信息,其他的object信息如果需要调用,需要在创建observable的过程中按照先后顺序添加]
另外,在调用obs_cache时 需要考虑 obs_cache是None的情况。



            sensors = [cubeA_pos, cubeA_quat, cubeB_pos, cubeB_quat, gripper_to_cubeA, gripper_to_cubeB, cubeA_to_cubeB]
            names = [s.__name__ for s in sensors]

此时sensors的顺序很重要,后方的函数可以调用前方函数生成的信息(存储在obs_cache中)
names 对应了每一个函数,用于下方 observables的创建,对应obs dict的key。

            # Create observables
            for name, s in zip(names, sensors):
                observables[name] = Observable(
                    name=name,
                    sensor=s,
                    sampling_rate=self.control_freq,
                )

将创建的observable 加入到 observables中,在后续reset, step中的obs中会包含此处的信息。

环境reset

用于环境的reset。 hard_reset会从模型开始重置;reset会重新设置object的位置和角度,robot的初始化等等。

def _reset_internal(self):
        """
        Resets simulation internal configurations.
        """
        super()._reset_internal()

初始化机器人


        # Reset all object positions using initializer sampler if we're not directly loading from an xml
        if not self.deterministic_reset:

            # Sample from the placement initializer for all objects
            object_placements = self.placement_initializer.sample()

            # Loop through all objects and reset their positions
            for obj_pos, obj_quat, obj in object_placements.values():
                self.sim.data.set_joint_qpos(obj.joints[0], np.concatenate([np.array(obj_pos), np.array(obj_quat)]))

初始化object的位置和角度

? 是否可以在此处改变Object的大小

RL环境的reward定义

上述的模型创建完成的 环境和obs的创建, 剩下的就是生成reward(reward, staged_rewards)和查看环境是否终止(_check_success)。 但是实际上,我们可以在 gym.wrapper时完成这两部分内容的定义,而不是在RoboSuite env中完成定义。

reward

def reward(self,action): 
	return reward

reward function输入不包含obs,可以直接从sim.data中调用数据。
pros: 可以使用Mujoco自带的contact information来定义奖励,使用更多的信息来定义奖励。
cons:个人建议 在gym.wrapper中定义,使用obs来定义奖励。

check_success

def _check_success(self):
	return success(bool)

用于判断是否任务结束, 配合初始化中的ignore_done, horizon进行使用。
rule:

  1. ignore_done 会disable _check_success和horizon。
  2. horizon达到的时候 done=True, info中不会有额外信息,无法区分是任务完成还是时间达到。

因此建议,在env中使用 无限长的horizon, 然后在gym.wrapper中区分任务完成还是时间达到(info中会有区别)。

summary

  1. 创建环境时需要修改的函数
函数名修改内容
init定义table相关信息,定义object的相关信息
_load_model设置机器人,table和object,形成self.model,与object初始化的placement_initializer
_setup_references获取物体对应的mujoco中的id, 方便在_set_observables中使用
_setup_observables用于设置obs,将Object的相关状态信息从sim.data中传递到observables中
_reset_internal用于初始化robot和objects
  1. env对应的函数
函数名修改内容
reward根据sim.data来设置奖励
_check_success设置env的终止条件
  1. visualize函数没有进行讲解

下一篇 完成RoboSuite 自定义的环境

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要使用Matlab Robotic Toolbox绘制动画,可以按照以下步骤进行操作: 1. 创建一个机器人对象:首先,需要创建一个机器人对象,可以通过加载机器人模型文件或手动创建机器人模型对象来实现。可以使用`loadrobot`函数加载现有的机器人模型文件,或者通过手动创建机器人对象中的各个部件来创建模型对象。 2. 配置模拟环境:为了在动画中模拟机器人的运动,需要配置模拟环境。可以通过创建一个模拟环境对象,并将机器人对象添加到该环境中来实现。可以使用`rigidBodyTreeEnv`函数创建一个模拟环境对象,并利用`setRobot`方法将机器人对象添加到环境中。 3. 设置动画参数:在绘制动画之前,可以设置动画参数,如动画的播放速度、持续时间等。可以使用`showdetails`函数查看机器人对象的详细信息,并使用`setParam`方法设置相应的参数。 4. 播放动画:一旦完成设置,就可以开始播放动画了。可以使用`show`函数播放机器人的动画。该函数会在新的图形窗口中显示机器人的模型,并模拟机器人的运动。可以使用鼠标和键盘进行视角调整和操作。 5. 自定义动画:除了简单的播放动画外,还可以根据需要进行自定义动画。可以通过获取机器人的当前位置和关节角度,并利用Matlab中的绘图函数绘制机器人的位置、轨迹等。 总之,使用Matlab Robotic Toolbox绘制动画需要创建机器人对象、配置模拟环境、设置动画参数,并使用相应的函数播放动画。此外,还可以根据需要进行自定义动画的绘制。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值