具身智能基础路线

大妞

已于 2024-06-04 14:02:10 修改

阅读量744

点赞数 5

文章标签：具身智能基础技术路线

于 2024-05-31 09:56:28 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/zvivi521/article/details/139340446

版权

参考地址：https://www.bilibili.com/video/BV1d5ukedEsi/

github地址：https://github.com/yunlongdong/Awesome-Embodied-AI

Scene Understanding

Image

	Description	Paper	Code
SAM	Segmentation	https://arxiv.org/abs/2304.02643	GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
YOLO-World	Open-Vocabulary Detection	https://arxiv.org/abs/2401.17270	GitHub - AILab-CVC/YOLO-World: [CVPR 2024] Real-Time Open-Vocabulary Object Detection

Segment anything(SAM)论文及demo使用保姆级教程-CSDN博客

Point Cloud

	Description	Paper	Code
SAM3D	Segmentation	https://arxiv.org/abs/2306.03908	GitHub - Pointcept/SegmentAnything3D: [ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
PointMixer	Understanding	https://arxiv.org/abs/2401.17270	GitHub - LifeBeyondExpectations/PointMixer

Multi-Modal Grounding

	Description	Paper	Code
GPT4V	MLM(Image+Language->Language)	https://arxiv.org/abs/2303.08774
Claude3-Opus	MLM(Image+Language->Language)	Introducing the next generation of Claude \ Anthropic
GLaMM	Pixel Grounding	https://arxiv.org/abs/2311.03356	GitHub - mbzuai-oryx/groundingLMM: [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
All-Seeing	Pixel Grounding	https://arxiv.org/abs/2402.19474	GitHub - OpenGVLab/all-seeing: [ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
LEO	3D	https://arxiv.org/abs/2311.12871	GitHub - embodied-generalist/embodied-generalist: [ICML 2024] Official code repository for 3D embodied generalist agent LEO

ICML'24开源 | LEO：首个三维世界中的具身通用智能体 - 哔哩哔哩

Data Collection

From Video

	Description	Paper	Code
Vid2Robot		https://vid2robot.github.io/vid2robot.pdf
RT-Trajectory		https://arxiv.org/abs/2311.01977
MimicPlay		https://mimic-play.github.io/assets/MimicPlay.pdf	GitHub - j96w/MimicPlay: "MimicPlay: Long-Horizon Imitation Learning by Watching Human Play" code repository

Hardware

	Description	Paper	Code
UMI	Two-Fingers	https://arxiv.org/abs/2402.10329	GitHub - real-stanford/universal_manipulation_interface: Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
DexCap	Five-Fingers	https://dex-cap.github.io/assets/DexCap_paper.pdf	GitHub - j96w/DexCap: [RSS 2024] "DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation" code repository
HIRO Hand	Hand-over-hand	https://sites.google.com/view/hiro-hand

Generative Simulation

	Description	Paper	Code
MimicGen		https://arxiv.org/abs/2310.17596	GitHub - NVlabs/mimicgen_environments: This code corresponds to simulation environments used as part of the MimicGen project.
RoboGen		https://arxiv.org/abs/2311.01455	GitHub - Genesis-Embodied-AI/RoboGen: A generative and self-guided robotic agent that endlessly propose and master new skills.

Action Output

动作规划

Generative Imitation Learning

	Description	Paper	Code
Diffusion Policy		https://arxiv.org/abs/2303.04137	GitHub - real-stanford/diffusion_policy: [RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
ACT		https://arxiv.org/abs/2304.13705	GitHub - tonyzhaozh/act

ffordance Map

	Description	Paper	Code
CLIPort	Pick&Place	https://arxiv.org/pdf/2109.12098.pdf	GitHub - cliport/cliport: CLIPort: What and Where Pathways for Robotic Manipulation
Robo-Affordances	Contact&Post-contact trajectories	https://arxiv.org/abs/2304.08488	GitHub - shikharbahl/vrb
Robo-ABC		https://arxiv.org/abs/2401.07487	GitHub - TEA-Lab/Robo-ABC: This is the official repository of "Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation"
Where2Explore	Few shot learning from semantic similarity	https://proceedings.neurips.cc/paper_files/paper/2023/file/0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf
Move as You Say, Interact as You Can	Affordance to motion from diffusion model	https://arxiv.org/pdf/2403.18036.pdf
AffordanceLLM	Grounding affordance with LLM	https://arxiv.org/pdf/2401.06341.pdf
Environment-aware Affordance		https://proceedings.neurips.cc/paper_files/paper/2023/file/bf78fc727cf882df66e6dbc826161e86-Paper-Conference.pdf
OpenAD	Open-Voc Affordance Detection from point cloud	https://www.csc.liv.ac.uk/~anguyen/assets/pdfs/2023_OpenAD.pdf	GitHub - Fsoft-AIC/Open-Vocabulary-Affordance-Detection-in-3D-Point-Clouds: [IROS 2023] Open-Vocabulary Affordance Detection in 3d Point Clouds
RLAfford	End-to-End affordance learning with RL	https://gengyiran.github.io/pdf/RLAfford.pdf
General Flow	Collect affordance from video	https://general-flow.github.io/general_flow.pdf	GitHub - michaelyuancb/general_flow: Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"
PreAffordance	Pre-grasping planning	https://arxiv.org/pdf/2404.03634.pdf
ScenFun3d	Fine-grained functionality&affordance in 3D scene	https://aycatakmaz.github.io/data/SceneFun3D-preprint.pdf	GitHub - SceneFun3D/scenefun3d: SceneFun3D ToolKit

Question&Answer from LLM

	Description	Paper	Code
COPA		https://arxiv.org/abs/2403.08248
ManipLLM		https://arxiv.org/abs/2312.16217
ManipVQA		https://arxiv.org/pdf/2403.11289.pdf	GitHub - SiyuanHuang95/ManipVQA: ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Language Corrections

	Description	Paper	Code
OLAF		https://arxiv.org/pdf/2310.17555
YAYRobot		https://arxiv.org/abs/2403.12910	GitHub - yay-robot/yay_robot: PyTorch implementation of YAY Robot

Planning from LLM

	Description	Paper	Code
SayCan	API Level	https://arxiv.org/abs/2204.01691	google-research/saycan at master · google-research/google-research · GitHub
VILA	Prompt Level	https://arxiv.org/abs/2311.17842

关注

5
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
具身智能基础路线

参考地址：https://www.bilibili.com/video/BV1d5ukedEsi/github地址：https://github.com/yunlongdong/Awesome-Embodied-AI
复制链接

扫一扫

大妞 CSDN认证博客专家 CSDN认证企业博客

码龄18年

142: 原创

17万+: 周排名

1万+: 总排名

26万+: 访问

: 等级

3974: 积分

86: 粉丝

68: 获赞

45: 评论

98: 收藏

私信

关注

热门文章

分类专栏

java 21篇
android 108篇
jni 32篇
多线程 6篇
工具使用 2篇
linux 63篇
linux inotify 10篇
window7 1篇
编码 4篇
心情 1篇
ro属性 4篇
mount 1篇
file system 1篇
网络 1篇
android、linux、init.rc、 1篇
C
C++ 3篇
binder 1篇
broadcast
fastboot 2篇
kernel 4篇
event 2篇
socket 4篇

最新评论

zlmediakit 新增可以使用硬件加速的转码http api接口方法
kscg123456: 请问有解决么
ubuntu18.04 配置zlmediakit 支持ffmpeg转码记录
真的很累的: #incluude <iostream> using namespace std; class Date; class Time (public: Time(int,int,int); friend void display(const Date &,const Time &);private: int hour; int minute; int sec; Time::Time(int h,int m,ints)(hour-h; minute=m; sec=s class Date (public: Date(int,int,int); friend void display(const Date &,const Time &);nee. private: int month; int day; int year; Date::Date(int m,int d,inty)(month=m; day=d; year=y; void display(const Date &d,const Time &t)cout<<d.month<<"/"«d.day<<"/"<<d.year«endl;cout<t.hour<":"«t.minutec«":"«tsc«endl; int main() Time t1(10,13,56);Date d1(12,25,2004);display(d1,t1); return0;
android kernel 与文件系统
Tisfy: 正想看这样的文章，就遇到了它
借用现有工程进行directfb的编译
Mars.CN: 到底怎么解决的？
linux inotify 接口封装
17Pirate: 如果需求是监测u盘、USB鼠标、USB键盘、无线遥控器等设备，建议通过netlink 监控udev事件，可以监控到USB事件。

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。