【GR00T微调笔记】使用4090D微调NVIDIA Isaac GR00T，并在4070TiS上简单测试

最新推荐文章于 2025-04-25 18:39:33 发布

Chase.Liu

最新推荐文章于 2025-04-25 18:39:33 发布

阅读量467

点赞数 5

分类专栏：机器人文章标签：笔记学习机器人

本文链接：https://blog.csdn.net/CSDN_Lcy/article/details/147078153

版权

机器人专栏收录该内容

1 篇文章

订阅专栏

使用4090D微调NVIDIA Isaac GR00T，并在4070TiS上简单测试

基础测试

已在以下环境测试通过

显存不够，但可以跑推理

item	version
ubuntu	Ubuntu 22.04.5 LTS
cuda	12.4

nvidia-smi

显存不够，但可以简单微调测试

item	version
ubuntu	Ubuntu 22.04.3 LTS
cuda	12.6

nvidia-smi

准备python环境

建议先准备好cuda12.4，以防踩坑

conda create -n gr00t python=3.10
conda activate gr00t
pip install --upgrade setuptools
pip install -e .
pip install --no-build-isolation flash-attn==2.7.1.post4

最后一步报错

error

原版本为12.1，重新安装cuda12.4解决

cuda

安装完新版本cuda之后，重新安装python环境

error

pip install --no-build-isolation flash-attn==2.7.1.post4 --use-pep517

安装太慢，建议手动下载，直接安装

安装flash_attn失败记录

flash-attention

这里测试过两个，最后使用abiFALSE成功安装

flash_attn-2.7.1.post4+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

flash_attn

基础运行

可以跑在不同主机上，使用 --host 指定服务器地址

python scripts/inference_service.py --model_path nvidia/GR00T-N1-2B --server
python scripts/inference_service.py  --client
python scripts/inference_service.py  --client --host 192.168.50.178

警告

warning

发现只是警告，暂时不用管，开始下载模型了，这里需要注意网络问题

download

跑通基础功能

微调

进行微调测试

python scripts/gr00t_finetune.py --dataset-path ./demo_data/robot_sim.PickNPlace --num-gpus 1 --max-steps 500 --output-dir /tmp/gr00t-1/finetuned-model --data-config gr1_arms_only

微调前

在这里插入图片描述

单卡微调，显存不足

在这里插入图片描述

双卡微调，显存不足

在这里插入图片描述

这里应该是没办法直接模型并行，可能还是数据并行，具体需要查看脚本

这里仅为强行跑通流程方便测试实际效果未知

batch_size 4

修改脚本 ./scripts/gr00t_finetune.py 中训练参数如下

    # Training parameters
    batch_size: int = 4
    """Batch size per GPU for training."""

    tune_llm: bool = False
    """Whether to fine-tune the language model backbone."""

    tune_visual: bool = False
    """Whether to fine-tune the vision tower."""

    tune_projector: bool = False
    """Whether to fine-tune the projector."""

    tune_diffusion_model: bool = True
    """Whether to fine-tune the diffusion model."""

双卡

在这里插入图片描述

单卡

在这里插入图片描述

最后得到下面的结构

gr00t-1/
└── finetuned-model
    ├── checkpoint-500
    │   ├── config.json
    │   ├── experiment_cfg
    │   │   └── metadata.json
    │   ├── model-00001-of-00002.safetensors
    │   ├── model-00002-of-00002.safetensors
    │   ├── model.safetensors.index.json
    │   ├── optimizer.pt
    │   ├── rng_state_0.pth
    │   ├── rng_state_1.pth
    │   ├── scheduler.pt
    │   └── trainer_state.json
    ├── config.json
    ├── experiment_cfg
    │   └── metadata.json
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── runs
    ├── trainer_state.json
    └── training_args.bin

5 directories, 17 files

使用微调的模型跑推理测试

python scripts/inference_service.py --model_path "/tmp/gr00t-1/finetuned-model/checkpoint-500" --embodiment_tag "new_embodiment" --server
python scripts/inference_service.py  --client
python scripts/inference_service.py  --client --host 192.168.50.178