具身智能VLA方向--基于仿真数据的单/双臂VLA模型训练(RDT与openpi)

系列文章目录

具身智能VLA方向模型fine-tune(单臂)(24.12.26已完结)
该文章主要给出了基于实机场景下的一些数据转化与openVLA,RDT模型的finetune与部署

前言

本篇文章将基于前文,结合当前我所做的一些项目相关的工作,给出一些基于仿真平台的数据采集处理与模型微调,本篇文章主要基于仿真平台:
RoboTwin
进行数据采集与转化并进行训练(我不是1.0版本的协作者,我是1.0发布后才加入的,另外提一嘴,2.0 coming soon~)
由于内侧版本还没发出,所以所有的数据都使用1.0采集展示~
然后我将通过RDT生成的数据转化成对应RDTopenpi所支持的数据,进行模型微调。

一、RoboTwin部署

RoboTwin部署只需要参考:
需要注意的是,为了部署RDT,请按照下面我的一些改动来:

  1. 配置RDT相关环境(python3.10 && torch2.1.0)
    RoboticsDiffusionTransformer
conda create -n RoboTwin python=3.10.0
conda activate RoboTwin
# Install pytorch
# Look up https://pytorch.org/get-started/previous-versions/ with your cuda version for a correct command
pip install torch==2.1.0 torchvision==0.16.0  --index-url https://download.pytorch.org/whl/cu121
pip install packaging==24.0
# Install flash-attn
pip install flash-attn --no-build-isolation
# Install other prequisites
pip install -r requirements.txt
  1. 配置RoboTwin
pip install sapien==3.0.0b1 scipy==1.10.1 mplib==0.1.1 gymnasium==0.29.1 trimesh==4.4.3 open3d==0.18.0 imageio==2.34.2 pydantic zarr openai huggingface_hub==0.25.0

上面相较于RoboTwin INSTALLATION.md
去除了torch2.4.1,使用了RDT的2.1.0,剩下步骤请直接按照INSTALLATION.md来。
3. 测试是否配置成功

python scripts/test_render.py

输出了render ok就是成功了。

二、基于RoboTwin采集数据

采集数据

RoboTwin采集数据方式十分简单:
在RoboTwin项目根目录下:

# 例如bash run_task.sh shoe_place 0
# 所有task_name可以在envs中查看
bash run_task.sh ${task_name} ${gpu_id}

如果我们想可视化观看每次动作采集的渲染(要有desktop),可以编辑对应在task_config中的${taskname}.yml中的render_freq参数,推荐设置:5,10,15,设置越高,生成速度越慢,如果需要大批量采集,建议设置为0(关闭可视化渲染)。
一些${task)name}.yml中的可修改参数:
use_seed: false
(整个RoboTwin采集属于分两步:1. 采集可以正确完成任务的seed 2. 对对应seed进行渲染并保存渲染结果。如果你之前已经采集过了可以成功的seed,那么会在指定路径下保存一个json文件,你可以直接基于seed进行渲染,不需要采集seed)
head_camera_type: L515
(我们自己使用是全部使用D435,因为实机是D435)
episode_num: 100
(收集多少组数据,可以根据需求调整)
depth: true
(建议设置false,因为VLA暂时都没有depth的使用,还会让生成变慢)
如果我们成功采集了数据,会发现在data下面有这样一个/${task_name}的文件夹,里面存储episode_num个episode,每个episode里面有若干个{%d}.whl文件,存储每一帧的对应数据。
在这里插入图片描述

数据格式转化hdf5

在上一篇中,我给出了一个基于.npy转.hdf5的python脚本,这一次我将给出一个从RoboTwin采集数据批量转化到RDT支持的hdf5格式数据的脚本:
由于2.0版本还没有测试完成,所以需要在./policy路径下git clone一个RDT代码:

cd policy
git clone https://github.com/thu-ml/RoboticsDiffusionTransformer.git
mv RoboticsDiffusionTransformer-main RDT
mkdir RDT/processed_data
cd ..

然后就可以在RoboTwin环境下运行python脚本pkl2hdf5_rdt.py

import sys
sys.path.append('./policy/RDT/')

import os
import h5py
import numpy as np
import pickle
import cv2
import argparse
# from scripts.encode_lang_batch_tpp import encode_lang

def images_encoding(imgs):
    encode_data = []
    padded_data = []
    max_len = 0
    for i in range(len(imgs)):
        success, encoded_image = cv2.imencode('.jpg', imgs[i])
        jpeg_data = encoded_image.tobytes()
        encode_data.append(jpeg_data)
        max_len = max(max_len, len(jpeg_data))
    # padding
    for i in range(len(imgs)):
        padded_data.append(encode_data[i].ljust(max_len, b'\0'))
    return encode_data, max_len

def data_transform(path, episode_num, save_path):
    begin = 0
    floders =  os.listdir(path)
    assert episode_num <= len(floders), "data num not enough"

    if not os.path.exists(save_path):
        os.makedirs(save_path)
    
    for i in range(episode_num):  # 遍历所有子任务
        subfolder_name = f"episode{i}"
        subfolder_path = os.path.join(path, subfolder_name)
        # 存储hdf5要使用的数据
        qpos = []
        actions = []
        cam_high = []
        cam_right_wrist = []
        cam_left_wrist = []

        if os.path.isdir(subfolder_path):  # 确保是文件夹
            episode = []
            pkl_files = [f for f in os.listdir(subfolder_path) if f.endswith('.pkl')]  # 获得所有.npy文件
            last_state = None
            for j in range(0, len(pkl_files)): 
                pkl_file_path = os.path.join(subfolder_path, f'{j}.pkl')
                with open(pkl_file_path, 'rb') as pkl_f:
                    data = pickle.load(pkl_f)

                state = np.array(data['joint_action'])  # joints angle       
                state = state.astype(np.float32)
                state[6] /= 0.045
                state[13] /= 0.045
                qpos.append(state)
                
                action = state
                actions.append(action)

                # if j == 0:
                #     pass
                # elif j == len(pkl_files)-1:
                #     action = state - last_state
                #     actions.append(action)
                #     actions.append(action)  # 最后一次轨迹没有预测,就用最后一次的轨迹本身作为预测
                # else:
                #     action = state - last_state
                #     actions.append(action)

                camera_high= data['observation']['head_camera']['rgb']
                camera_high = camera_high[:,:,::-1]
                camera_high_resized = cv2.resize(camera_high, (640,480))
                cam_high.append(camera_high_resized)
                
                camera_right_wrist = data['observation']['right_camera']['rgb']
                camera_right_wrist = camera_right_wrist[:,:,::-1]
                camera_right_wrist_resized = cv2.resize(camera_right_wrist, (640,480))
                cam_right_wrist.append(camera_right_wrist_resized)
           
                camera_left_wrist = data['observation']['left_camera']['rgb']
                camera_left_wrist = camera_left_wrist[:,:,::-1]
                camera_left_wrist_resized = cv2.resize(camera_left_wrist, (640,480))
                cam_left_wrist.append(camera_left_wrist_resized)
                # last_state = state

        hdf5path = os.path.join(save_path, f'episode_{i}.hdf5')
        with h5py.File(hdf5path, 'w') as f:
            f.create_dataset('action', data=np.array(actions))
            obs = f.create_group('observations')
            obs.create_dataset('qpos', data=np.array(qpos))
            image = obs.create_group('images')
            # 图像编码后按顺序存储
            cam_high_enc, len_high = images_encoding(cam_high)
            cam_right_wrist_enc, len_right = images_encoding(cam_right_wrist)
            cam_left_wrist_enc, len_left = images_encoding(cam_left_wrist)
            image.create_dataset('cam_high', data=cam_high_enc, dtype=f'S{len_high}')
            image.create_dataset('cam_right_wrist', data=cam_right_wrist_enc, dtype=f'S{len_right}')
            image.create_dataset('cam_left_wrist', data=cam_left_wrist_enc, dtype=f'S{len_left}')

        begin += 1
        print(f"proccess {i} success!")
    return begin

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Process some episodes.')
    parser.add_argument('task_name', type=str, default='block_hammer_beat',
                        help='The name of the task (e.g., block_hammer_beat)')
    parser.add_argument('setting', type=str)
    parser.add_argument('expert_data_num', type=int, default=50,
                        help='Number of episodes to process (e.g., 50)')
    args = parser.parse_args()
    
    task_name = args.task_name
    num = args.expert_data_num
    setting = args.setting
    
    data_path_name = task_name + '_' + setting
    begin = 0
    print(f'read data from path:{os.path.join("data/", data_path_name)}')
    begin = data_transform(os.path.join("data/",data_path_name), num, f"./policy/RDT/processed_data/{task_name}_{setting}_{num}")
# task_name:任务名,如shoe_place
# setting:有点忘了1.0版本有没有这个参数了,没的话就改下main函数,把有关setting的全删掉就行,
# data_path_name = task_name
# begin = data_transform(os.path.join("data/",data_path_name), num, #f"./policy/RDT/processed_data/{task_name}_{num}")
# expert_data_num:希望转化多少数据
python pkl2hdf5_rdt.py ${task_name} ${setting} ${expert_data_num}

如果一切顺利,我们将在./policy/RDT/processed_data下面看到{task_name}{setting}{num}文件夹,里面有episode_{%d}.hdf5的对应hdf5数据。

三、模型训练

本篇文章将基于RoboTwin的数据训练RDT和openpi两个目前认可度高的开源VLA模型。

RDT模型训练

RoboTwin最新版本已经集成RDT了,这是policy/RDT/README.md:

Deploy RDT on RoboTwin

1. Environment Setup

The conda environment for RDT with RoboTwin is identical to the official RDT environment. Please follow the (RDT official documentation) to install the environment and directly overwrite the RoboTwin virtual environment INSTALLATION.md.

# Make sure python version == 3.10
conda activate RoboTwin

# Install pytorch
# Look up https://pytorch.org/get-started/previous-versions/ with your cuda version for a correct command
pip install torch==2.1.0 torchvision==0.16.0  --index-url https://download.pytorch.org/whl/cu121

# Install packaging
pip install packaging==24.0
pip install ninja
# Verify Ninja --> should return exit code "0"
ninja --version; echo $?
# Install flash-attn
pip install flash-attn==2.7.2.post1 --no-build-isolation

# Install other prequisites
pip install -r requirements.txt
# If you are using a PyPI mirror, you may encounter issues when downloading tfds-nightly and tensorflow. 
# Please use the official source to download these packages.
# pip install tfds-nightly==4.9.4.dev202402070044 -i  https://pypi.org/simple
# pip install tensorflow==2.15.0.post1 -i  https://pypi.org/simple
2. Download Model
# In the RoboTwin/policy directory
cd ../weights
mkdir RDT && cd RDT
# Download the models used by RDT
huggingface-cli download google/t5-v1_1-xxl --local-dir t5-v1_1-xxl
huggingface-cli download google/siglip-so400m-patch14-384 --local-dir siglip-so400m-patch14-384
huggingface-cli download robotics-diffusion-transformer/rdt-1b --local-dir rdt-1b
3. Generate HDF5 Data

First, create the processed_data and training_data folders in the policy/RDT directory:

mkdir processed_data && mkdir training_data

To generate the data for converting to HDF5, you need to run the following command in the RoboTwin/ directory:

cd ../..
bash run_task.sh ${task_name} ${gpu_id}

The data will be saved by default in the RoboTwin/data/${task_name}_${camera_type}_pkl directory.

Then, run the following in the RoboTwin/policy/RDT directory:

cd policy/RDT
# task_name: the already generated data, default located in data/${task_name}
# head_camera_type: default to D435
# expert_data_num: the number of data to be converted to hdf5
# gpu_id: running language encoding,default to 0
# After running, the data will be saved to policy/RDT/processed_data by default
bash process_data_rdt.sh $task_name $head_camera_type $expert_data_num $gpu_id

If success, you will find the ${task_name}_${expert_data_num} folder under policy/RDT/processed_data, with the following data structure:

`processed_data/${task_name}_${expert_data_num}:`
`instructions/lang_embed_{%d}.pt`
`episode_{%d}.hdf5`
4. Generate Configuration File
cd policy/RDT
# model_name: the name you want to save your model as, it is recommended to use ${task_name_1}_${num_1}_${task_name_2}_${num_2}... for easy record-keeping
bash generate.sh ${model_name}

This will create a folder named \${model_name} under training_data and a configuration file \${model_name}.yml under model_config.

Move all the data you wish to use for training into training_data${model_name}. If you have multiple tasks with different data, simply move them in the same way.

Example folder structure:

`training_data/${model_name}:`
`\${task_1}/episode_{%d}.hdf5`
`\${task_1}/instructions/lang_embed_{%d}.pt`
`\${task_2}/episode_{%d}.hdf5`
`\${task_2}/instructions/lang_embed_{%d}.pt`
`...`

In model_config/${model_name}.yml, you need to manually set the GPU to be used. For a single GPU, set it to 0.

5. Finetune model

Once the training parameters are set, you can start training with:

bash finetune.sh ${model_name}
6. Eval on RoboTwin

Once the model fine-tuning is complete, you can test your model’s performance on the RoboTwin simulation platform. RoboTwin offers more than 20 tasks to choose from, and you can find them in the RoboTwin/task_config directory.

bash eval.sh $task_name $head_camera_type $model_name $checkpoint_id $seed $gpu_id

openpi模型训练

RoboTwin最新版本已经集成openpi了,这是policy/openpi/README.md:

OpenPI on RoboTwin Usage

1. Environment Setup

Follow the official OpenPI website to configure the environment. The OpenPI + RoboTwin environment has already been pre-configured in a file, so no additional setup is needed.

GIT_LFS_SKIP_SMUDGE=1 uv sync

install pytorch3d:

conda deactivate
source .venv/bin/activate
# At this point, you should be in the (openpi) environment
pip install portalocker tabulate yacs iopath fvcore
cd ../../third_party/pytorch3d_simplified/
pip install .
# if error:
python setup.py install
pip uninstall pytorch3d
pip install .

cd ../../policy/openpi/
bash

Note that the uv environment will only take effect when the current directory is set as the root directory.
Or you can use uder commands:

source .venv/bin/activate

Next, locate mplib within the (openpi) environment:

uv run where_is_package.py

Then, based on the printed output, modify the corresponding mplib as needed:
Modification Reference

2. Generate Data

We have already generated HDF5 data in the conda environment, and you can refer to the section in the RoboTwin/policy/RDT/README.md for generating HDF5 data.
After generating the HDF5 data, we can directly generate the LerobotDataset format data for OpenPI.
Unlike the data generation process in RDT, we need to manually move the /data/instructions/${task_name}.json file to the corresponding ${task_%d}/ directory and rename it as instructions.json.

# hdf5_path: The path to the generated HDF5 data (e.g., ./training_data/empty_cup_place_500_hdf5/)
# dataset_name: The name of the dataset (e.g., empty_cup_place_500)
bash generate.sh ${hdf5_path} ${dataset_name}

training_data/${hdf5_path}:
${task_1}/episode_{%d}.hdf5
${task_1}/instructions.json
${task_2}/episode_{%d}.hdf5
${task_2}/instructions.json
...

Here, the instructions.json corresponds to the task instructions, located in RoboTwin/data/instructions/ as ${task_name}.json.
Generating the dataset can take some time—about half an hour for 100 sets, so feel free to take a break.

note!

If you don’t have enough disk space under the ~/.cache path, please use the following command to set a different cache directory with sufficient space:

export LEROBOT_HOME=/path/to/your/cache

This is because generating the lerobotdataset will require a large amount of space.And the datasets will be writed into $LEROBOT_HOME.

3. Write the Corresponding train_config

In src/openpi/training/config.py, there is a dictionary called _CONFIGS. You can modify two pre-configured PI0 configurations I’ve written:
pi0_base_aloha_robotwin_lora
pi0_fast_aloha_robotwin_lora
pi0_base_aloha_robotwin_full
pi0_fast_aloha_robotwin_full

You only need to write repo_id on your datasets.
If you want to change the name in TrainConfig, please include fast if you choose pi_fast_base model.

4. Finetune model

Simply modify the repo_id to fine-tune the model:

# train_config_name: The name corresponding to the config in _CONFIGS, such as pi0_base_aloha_full
# model_name: You can choose any name for your model
# gpu_use: if not using fsdp_devices,set to gpu_id like 0;else set like 0,1,2,3
bash finetune.sh ${train_config_name} ${model_name} ${gpu_use}
Training modeMemory RequiredExample GPU
Fine-Tuning (LoRA)> 48 GBA6000(48G)
Fine-Tuning (Full)> 100 GBA100 (80GB) / H100

If your GPU memory is insufficient, please set the fsdp_devices parameter according to the following GPU memory reference, or reduce the batch_size parameter:
The default batch_size is 32 in the table below.

GPU memoryModel typeGPU numfsdp_devicesExample GPU
24Glora224090(24G)
40Glora22A100(40G)
48Glora11A6000(48G)
40Gfull244090(24G)
80Gfull224090(24G)
5. Eval on RoboTwin

Once the model fine-tuning is complete, you can test your model’s performance on the RoboTwin simulation platform. RoboTwin offers more than 20 tasks to choose from, and you can find them in the RoboTwin/task_config directory.

bash eval.sh $task_name $head_camera_type $train_config_name $model_name $checkpoint_id $seed $gpu_id

四、RoboTwin仿真测试

先放两个openpi的部署demo~
RoboTwin with VLA的分支正在校对,即将发布~

pi0_robotwin_empty_cup_place_success_demo

pi0_robotwin_empty_cup_place_success_demo

关于数据仿真采集与测试的一些说明

开启视频保存

RoboTwin/task_config/{task_name}.yml中,可以选择eval_video_log,这样会固定间隔保存图片,并用ffmpeg进行帧合成,在RoboTwin/eval_video下生成对应的视频。
该参数可以让你在服务器上查看eval的失败/成功的原因。

可视化生成

RoboTwin/task_config/{task_name}.yml中,可以设置render_freq的值来直接在数据生成和评估的时候实时的查看机械臂运动情况。
需要注意,服务器上没法开启,因为是直接可视化到屏幕的。

使用自己的机械臂怎么去训练呢?

训练自己的机械臂

假设你已经模仿libero(单臂)/aloha(双臂)格式将自己的机械臂转化为lerobot格式了(唯一要注意的是你的机械臂自由度可能和libero不同),记得把LiberoOutput的输出维度改为[:8],然后你就可以设置一个你的train_config了。

注意事项
  1. 如果你是单臂,请将wrist_image填充到左臂,state则是填充到右臂[:action_dims],这样微调效果比较好(由于动作空间共享,所以无论单臂是在左边还是右边,都要填充成这样,不要往前填0)

  2. 记得修改config.py中:delta_action_mask = _transforms.make_bool_mask(action_dims-1, -1, action_dims-1, -1)

  3. 如果是aloha_policy请设置adapt_to_pi=false

### 解决 Git 克隆仓库时权限被拒绝的问题 当遇到 `git clone` 命令返回 `Permission denied (publickey)` 错误时,这通常意味着客户端无法通过 SSH 密钥验证身份。为了成功执行基于 SSH 协议的 Git 操作,必须确保本地机器已配置有效的 SSH 私钥,并且对应的公钥已在远程服务器(如 GitHub 或 Gerrit)注册。 #### 生成并添加 SSH 密钥到 GitHub/Gerrit 账户 如果 `.ssh` 文件夹不存在名为 `id_rsa.pub` 的文件,则表明尚未创建过 SSH 密钥对。此时可以通过如下命令来生成新的密钥: ```bash ssh-keygen -t rsa -b 4096 -C "your_email@example.com" ``` 上述指令会提示输入保存位置,默认路径即为 `~/.ssh/id_rsa` 和 `~/.ssh/id_rsa.pub`;接着询问设置密码保护私钥,可根据个人需求决定是否设定[^1]。 完成之后,在终端运行下面这条语句查看新产生的公钥内容以便稍后提交给目标平台: ```bash cat ~/.ssh/id_rsa.pub ``` 随后登录至 GitHub 或者其他托管服务提供商处找到账户安全选项下的 SSH Keys 设置页面,新建条目并将刚才获取到的文字粘贴进去确认添加[^2]。 #### 添加主机名到已知列表 对于首次访问某些特定域名可能会弹出关于其真实性未得到证实的消息框,按照指示键入 “yes”,这样做的目的是让 OpenSSH 客户端信任该站点的身份认证信息从而允许建立连接[^3]。 #### 测试 SSH 连接有效性 最后一步是要检验当前环境能否顺利连通远端仓库提供方的服务接口,可借助此方法尝试发起握手请求: ```bash ssh -T git@github.com ``` 假如一切正常的话应该能看到一条欢迎消息告知关联账号名称;反之则需重新审视前面几步操作是否有遗漏之处或者考虑联系技术支持寻求帮助。
评论 23
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值