Dex-Net 2.0 论文翻译

最新推荐文章于 2024-07-13 19:47:11 发布

一呆飞仙

最新推荐文章于 2024-07-13 19:47:11 发布

阅读量8.3k

点赞数 3

分类专栏： HUMAN+ 文章标签：深度学习数据网络 3d 设计

本文链接：https://blog.csdn.net/l297969586/article/details/78443545

版权

HUMAN+ 专栏收录该内容

11 篇文章

订阅专栏

DEX-Net 2.0 提出了一个包含670万点云数据的数据集，设计了GQ-CNN网络评估抓取质量，并通过排序机制确定最优抓取方案。该系统利用深度学习技术改进了机械手爪的抓取成功率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、绪论

1）本文的主要贡献
1、制作dex-net2.0数据集，该数据集包括670万点云数据，又从1500个
3D模型通过GWS（抓手运行空间分析）得到手爪的运行规划
2、设计Grasp Quality Convolutional Neural Network (GQ-CNN)，去得到一系列鲁棒性良好的抓取规划
3、设置一种抓取机制，可以对得到的鲁棒性良好的一组抓取规划进行
rank排序，最终得到最优的抓取规划
2）相关工作
为了执行机械手爪对物体的抓取，通常的方法是预先计算被抓物体的状态（形状、大小、物体位姿、相机位姿、摩擦系数等），使用点云注册的方法对预先计算的抓取物体进行索引：使用视觉和几何相似度将匹配点云与数据库中的已知3D对象模型进行匹配，并执行最优的抓取规划。
一个鲁棒性抓取规划（RGP），需要最大限度的提高规划的鲁棒性，或者在度量和控制的误差相对较大下尽量提高规划的准确性。
因此，dex-net 2.0数据集对dex-net 1.0版本进行了扩展，极大提高了RGP的采样复杂度（结合了点云和候选鲁棒性抓取），之后通过训练一个卷积神经网络模型来达到最优大区的目的。
3）运作流图
这里写图片描述

二、问题阐述

这里写图片描述
1）假设
平行手爪夹持器，且已知形状大小
可在平面工件表面上划分的刚性区域
用深度相机拍摄的单视图（2.5D）点云
单个深度相机，且已知大小形状
2）名词解释
：为状态函数，其中O为待抓物体的形状，To为物体坐标系，Tc为相机坐标系，γ为摩擦系数
image y：为深度图或者2.5D点云
Grasp g：定义为这里写图片描述可以看作是一次抓取规划，其中p为手爪的三维坐标（物体坐标系下）， ψ为手抓相对于抓取点对的旋转
Succsee Metric S：为一次抓取规划成功的度量，定义为

，其中Eq定义为epsilon质量，包括摩擦系数和夹持器姿态的不确定性带来的姿态误差的鲁棒性度量，collfree(u,x)为执行抓取u，状态为x时无碰撞，对此进行鲁棒性分析。
此时y为观测值，x为实际状态值
定义：
鲁棒性函数：
这里写图片描述

是一次抓取规划与观测值（深度图/点云）联合分布情况下，成功规划S的数学期望
我们的最终目的是学习一个鲁棒性函数这里写图片描述，使得
策略函数满足条件
，C为对跖点对集
即训练一个网络，使得网络参数θ：
为网络参数集合，L为交叉熵损失函数

三、数据集的产生

这里写图片描述
1）1500个原始3D网格模型，通过Dex-Net 1.0的方式在模型表面生成数百个垂直于表面的模拟抓取点，通过对跖的方式找到对应点对。
2）通过渲染的方式还原物体模型，为了去除学习旋转不变性的需要，通过旋转的方式将每个对跖点对连线方向与定义坐标系横轴对齐。并通过缩放的方式将对跖点距离缩放为统一大小，并截取32*32区域为数据集。
3）通过以上方式可以获得670万个抓取数据集图片
由于外界噪声、测量误差、相机参数误差等因素，使得状态x
服从：这里写图片描述联合分布
在该联合分布下，通过epsilon质量（包括摩擦系数和夹持器姿态的不确定性带来的姿态误差的鲁棒性度量）来将670万数据分成正负样本

四、GQ-CNN

这里写图片描述
1）输入与输出
输入为：1、记录Grasp Candidate抓取点（i，j），对跖点对连线旋转置
与对应坐标系横轴平行，记录旋转角度θ，缩放置固定大小后截取32*32图片，成为Aligned Image（对准图片）
2、抓取点深度Z
输出：鲁棒性函数：这里写图片描述再通过rank得分得到鲁棒性排序函数从而得到初期最优抓取规划
2）第一层卷积核的特殊作用
对于第一层卷积核，可以表达出图像的梯度信息，根据此梯度信息可以推测手爪与物体的碰撞信息，然后根据collfree与这里写图片描述可以判断出最优抓取规划
3）前期工作
1、预处理：图像进行正则化处理
2、为增加数据集，对det-net2.0数据集进行水平翻转
3、采用零均值高斯噪声来近似取代噪声影响
4）待抓取物体要求
1、具有对抗性几何物体特性
2、体积小于工作空间
3、重量小于0.25kg，抓取点高于平台1cm（因为yumi手抓特性决定的）

五、GQ-CNN精度对比

对于Adv-Synth（189K-datapoints）、Adv-Phys（400K-datapoints）、Dex-Net-Small（670K-datapoints）、Dex-Net-Large（6.7m-datapoints）四个数据集，80%置为训练集、20%为验证集，精度对比如下：
这里写图片描述
其中，真正类率(true positive rate ,TPR), 计算公式为TPR=TP/ (TP+ FN)，刻画的是分类器所识别出的正实例占所有正实例的比例。另外一个是假正类率(false positive rate, FPR),计算公式为FPR= FP / (FP + TN)，计算的是分类器错认为正类的负实例占所有负实例的比例。

六、GQ-CNN几种参照性能对比

Success Rate：在认为随意放置的待抓物体，手爪通过移动旋转操作之后能抓起物体的成功比率
Precision：用比率表示鲁棒性，50%为阈值
Robust Grasp Rate：在Precision高于50%的抓取规划中，真正去实施抓取的比例
Planning Time：得到深度图到去执行抓取动作所需要的时间

这里写图片描述

七、抓取失败的两因素与分析

最主要失败的原因：
1、RGB-D图中有细小部分没有被检测出
2、没有正确的分析到哪些为非碰撞区域

▲：要用更精确的深度传感器以及碰撞区分析方法

八、参考文献

[1] Mart´ ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,
Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey
Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine
learning on heterogeneous distributed systems. arXiv preprint
arXiv:1603.04467, 2016.
[2] Ravi Balasubramanian, Ling Xu, Peter D Brook, Joshua R Smith,
and Yoky Matsuoka. Physical human interactive guidance: Iden-
tifying grasping principles from human-planned grasps. IEEE
Trans. Robotics, 28(4):899–910, 2012.
[3] Jeannette Bohg and Danica Kragic. Learning grasping points with
shape context. Robotics and Autonomous Systems, 58(4):362–
377,
2010.
[4] Jeannette Bohg, Antonio Morales, Tamim Asfour, and Danica
Kragic. Data-driven grasp synthesisa survey. IEEE Trans.
Robotics, 30(2):289–309, 2014.
[5] Peter Brook, Matei Ciocarlie, and Kaijen Hsiao. Collaborative
grasp planning with multiple object representations.
In Proc. IEEEInt. Conf. Robotics and Automation (ICRA), pages 2851–2858.
IEEE, 2011.
[6] I-Ming Chen and Joel W Burdick. Finding antipodal point grasps
on irregularly shaped objects. IEEE Trans. Robotics and Automa-
tion, 9(4):507–512, 1993.
[7] Matei Ciocarlie, Kaijen Hsiao, Edward Gil Jones, Sachin Chitta,
Radu Bogdan Rusu, and Ioan A S¸ucan. Towards reliable grasping
and manipulation in household environments. In Experimental
Robotics, pages 241–252. Springer, 2014.
[8] Navneet Dalal and Bill Triggs. Histograms of oriented gradients
for human detection. In Proc. IEEE Conf. on Computer Vision
andPatternRecognition(CVPR),volume1,pages886–893.IEEE,
2005.
[9] Renaud Detry, Carl Henrik Ek, Marianna Madry, and Danica
Kragic. Learning a dictionary of prototypical grasp-predicting
parts from grasping experience. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 601–608. IEEE, 2013.
[10] Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin
Riedmiller, and Wolfram Burgard. Multimodal deep learning for
robust rgb-d object recognition. In Proc. IEEE/RSJ Int. Conf.
on Intelligent Robots and Systems (IROS), pages 681–687. IEEE,
2015.
[11] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The
elements of statistical learning, volume 1. Springer series in
statistics Springer, Berlin, 2001.
[12] Ken Goldberg, Brian V Mirtich, Yan Zhuang, John Craig, Brian R
Carlisle, and John Canny. Part pose statistics: Estimators and
experiments. IEEE Trans. Robotics and Automation, 15(5):849–
857, 1999.
[13] Corey Goldfeder and Peter K Allen. Data-driven grasping. Au-
tonomous Robots, 31(1):1–20, 2011.
[14] Corey Goldfeder, Matei Ciocarlie, Hao Dang, and Peter K Allen.
The columbia grasp database. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 1710–1716. IEEE, 2009.
[15] Marcus Gualtieri, Andreas ten Pas, Kate Saenko, and Robert Platt.
Highprecisiongraspposedetectionindenseclutter. arXivpreprint
arXiv:1603.01564, 2016.
[16] Menglong Guo, David V Gealy, Jacky Liang, Jeffrey Mahler,
Aimee Goncalves, Stephen McKinley, and Ken Goldberg. Design
of parallel-jaw gripper tip surfaces for robust grasping. In Proc.
IEEE Int. Conf. Robotics and Automation (ICRA), 2017.
[17] Saurabh Gupta, Pablo Arbeláez, Ross Girshick, and Jitendra Ma-
lik. Aligning 3d models to rgb-d images of cluttered scenes. In
Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), pages 4731–4740, 2015.
[18] Richard Hartley and Andrew Zisserman. Multiple view geometry
in computer vision. Cambridge university press, 2003.
[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delv-
ing deep into rectifiers: Surpassing human-level performance on
imagenet classification. In Proc. IEEE Int. Conf. on Computer
Vision (ICCV), pages 1026–1034, 2015.
[20] Carlos Hernandez, Mukunda Bharatheesha, Wilson Ko, Hans
Gaiser, Jethro Tan, Kanter van Deurzen, Maarten de Vries, Bas
Van Mil, Jeff van Egmond, Ruben Burger, et al. Team delft’s
robotwinneroftheamazonpickingchallenge2016. arXivpreprint
arXiv:1610.05514, 2016.
[21] Alexander Herzog, Peter Pastor, Mrinal Kalakrishnan, Ludovic
Righetti, Jeannette Bohg, Tamim Asfour, and Stefan Schaal.
Learning of grasp selection based on shape-templates. Au-
tonomous Robots, 36(1-2):51–65, 2014.
[22] Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan
Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. Multi-
modal templates for real-time detection of texture-less objects in
heavily cluttered scenes. In Proc. IEEE Int. Conf. on Computer
Vision (ICCV), pages 858–865. IEEE, 2011.
[23] MaxJaderberg,KarenSimonyan,AndrewZisserman,etal. Spatial
transformer networks. In Proc. Advances in Neural Information
Processing Systems, pages 2017–2025, 2015.
[24] Edward Johns, Stefan Leutenegger, and Andrew J Davison. Deep
learning a grasp function for grasping under gripper pose uncer-
tainty. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and
Systems (IROS), pages 4461–4468. IEEE, 2016.
[25] Daniel Kappler, Jeannette Bohg, and Stefan Schaal. Leveraging
big data for grasp planning. In Proc. IEEE Int. Conf. Robotics and
Automation (ICRA), 2015.
[26] Alexander Kasper, Zhixing Xue, and Rüdiger Dillmann. The
kit object models database: An object model database for object
recognition, localization and manipulation in service robotics. Int.
Journal of Robotics Research (IJRR), 31(8):927–934, 2012.
[27] Ben Kehoe, Akihiro Matsukawa, Sal Candido, James Kuffner,
and Ken Goldberg. Cloud-based robot grasping with the google
object recognition engine. In Proc. IEEE Int. Conf. Robotics and
Automation (ICRA), pages 4263–4270. IEEE, 2013.
[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet
classification with deep convolutional neural networks. In Proc.
Advances in Neural Information Processing Systems, pages 1097–
1105, 2012.
[29] Michael Laskey, Jeffrey Mahler, Zoe McCarthy, Florian T Poko-
rny, Sachin Patil, Jur van den Berg, Danica Kragic, Pieter Abbeel,
and Ken Goldberg. Multi-armed bandit models for 2d grasp
planning with uncertainty. In Proc. IEEE Conf. on Automation
Science and Engineering (CASE). IEEE, 2015.
[30] Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, San-
jay Krishnan, Kevin Jamieson, Anca Dragan, and Ken Goldberg.
Comparing human-centric and robot-centric sampling for robot
deep learning from demonstrations. In Proc. IEEE Int. Conf.
Robotics and Automation (ICRA). IEEE, 2017.
[31] Ian Lenz, Honglak Lee, and Ashutosh Saxena. Deep learning for
detecting robotic grasps. Int. Journal of Robotics Research (IJRR),
34(4-5):705–724, 2015.
[32] Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre
Quillen. Learning hand-eye coordination for robotic grasping
with deep learning and large-scale data collection. arXiv preprint
arXiv:1603.02199, 2016.
[33] Jeffrey Mahler, Brian Hou, Sherdil Niyaz, Florian T Pokorny,
Ramu Chandra, and Ken Goldberg. Privacy-preserving grasp
planning in the cloud. In Proc. IEEE Conf. on Automation Science
and Engineering (CASE), pages 468–475. IEEE, 2016.
[34] Jeffrey Mahler, Florian T Pokorny, Brian Hou, Melrose Roderick,
Michael Laskey, Mathieu Aubry, Kai Kohlhoff, Torsten Kröger,
James Kuffner, and Ken Goldberg. Dex-net 1.0: A cloud-based
network of 3d objects for robust grasp planning using a multi-
armed bandit model with correlated rewards. In Proc. IEEE Int.
Conf. Robotics and Automation (ICRA). IEEE, 2016.
[35] Tanwi Mallick, Partha Pratim Das, and Arun Kumar Majumdar.
Characterizations of noise in kinect depth images: A review. IEEE
Sensors Journal, 14(6):1731–1740, 2014.
[36] John W Miller, Rod Goodman, and Padhraic Smyth. On loss
functions which minimize to conditional expected values and
posterior probabilities. IEEE Transactions on Information Theory,
39(4):1404–1408, 1993.
[37] Luis Montesano and Manuel Lopes. Active learning of visual
descriptors for grasping using non-parametric smoothed beta dis-
tributions. Robotics and Autonomous Systems, 60(3):452–462,
2012.
[38] Anh Nguyen, Dimitrios Kanoulas, Darwin G Caldwell, and
Nikos G Tsagarakis. Detecting object affordances with convolu-
tional neural networks. In Proc. IEEE/RSJ Int. Conf. on Intelligent
Robots and Systems (IROS), pages 2765–2770. IEEE, 2016.
[39] John Oberlin and Stefanie Tellex. Autonomously acquiring
instance-based object models from experience. In Int. S. Robotics
Research (ISRR), 2015.
[40] Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision:
Learning to grasp from 50k tries and 700 robot hours. In Proc.
IEEE Int. Conf. Robotics and Automation (ICRA), 2016.
[41] Lerrel Pinto, James Davidson, and Abhinav Gupta. Supervision
via competition: Robot adversaries for learning tasks. arXiv
preprint arXiv:1610.01685, 2016.
[42] Florian T Pokorny and Danica Kragic. Classical grasp quality
evaluation: New algorithms and theory. In Proc. IEEE/RSJ Int.
Conf. on Intelligent Robots and Systems (IROS), pages 3493–3500.
IEEE, 2013.
[43] Lorenzo Porzi, Samuel Rota Bulo, Adrian Penate-Sanchez, Elisa
Ricci, and Francesc Moreno-Noguer. Learning depth-aware deep
representations for robotic perception. IEEE Robotics & Automa-
tion Letters, 2016.
[44] Domenico Prattichizzo and Jeffrey C Trinkle. Grasping. In
Springer handbook of robotics, pages 671–700. Springer, 2008.
[45] Joseph Redmon and Anelia Angelova. Real-time grasp detection
using convolutional neural networks. In Proc. IEEE Int. Conf.
Robotics and Automation (ICRA), pages 1316–1322. IEEE, 2015.
[46] Alberto Rodriguez, Matthew T Mason, and Steve Ferry. From
caging to grasping. Int. Journal of Robotics Research (IJRR), page
0278364912442972, 2012.
[47] Reuven Y Rubinstein, Ad Ridder, and Radislav Vaisman. Fast
sequential Monte Carlo methods for counting and optimization.
John Wiley & Sons, 2013.
[48] Fereshteh Sadeghi and Sergey Levine. (cad)2 rl: Real single-
image flight without a single real image. arXiv preprint
arXiv:1611.04201, 2016.
[49] Renato F Salas-Moreno, Richard A Newcombe, Hauke Strasdat,
Paul HJ Kelly, and Andrew J Davison. Slam++: Simultaneous
localisation and mapping at the level of objects. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), pages
1352–1359, 2013.
[50] Ashutosh Saxena, Justin Driemeyer, and Andrew Y Ng. Robotic
grasping of novel objects using vision. The International Journal
of Robotics Research, 27(2):157–173, 2008.
[51] Daniel Seita, Florian T Pokorny, Jeffrey Mahler, Danica Kragic,
Michael Franklin, John Canny, and Ken Goldberg. Large-scale su-
pervised learning of the grasp robustness of surface patch pairs. In
Proc. IEEE Int. Conf. on Simulation, Modeling, and Programming
of Autonomous Robots (SIMPAR). IEEE, 2016.
[52] Andreas ten Pas and Robert Platt. Using geometry to detect grasp
posesin3dpointclouds. InIntlSymp.onRoboticsResearch,2015.
[53] Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter
Abbeel, Sergey Levine, Kate Saenko, and Trevor Darrell. Adapt-
ing deep visuomotor representations with weak pairwise con-
straints. In Workshop on the Algorithmic Foundation of Robotics
(WAFR), 2016.
[54] Jacob Varley, Jonathan Weisz, Jared Weiss, and Peter Allen. Gen-
erating multi-fingered robotic grasps via deep learning. In Proc.
IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),
pages 4415–4420. IEEE, 2015.
[55] Matthew Veres, Medhat Moussa, and Graham W Taylor. Modeling
grasp motor imagery through deep conditional generative models.
arXiv preprint arXiv:1701.03041, 2017.
[56] JonathanWeiszandPeterKAllen. Poseerrorrobustgraspingfrom
contact wrench space metrics. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 557–562. IEEE, 2012.
[57] Walter Wohlkinger, Aitor Aldoma, Radu B Rusu, and Markus
Vincze. 3dnet: Large-scale object class recognition from cad
models. InProc.IEEEInt.Conf.RoboticsandAutomation(ICRA),
pages 5384–5391. IEEE, 2012.
[58] Ziang Xie, Arjun Singh, Justin Uang, Karthik S Narayan, and
Pieter Abbeel. Multimodal blending for high-accuracy instance
recognition. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots
and Systems (IROS), pages 2214–2221. IEEE, 2013.
[59] Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo,
Ed Walker Jr, Alberto Rodriguez, and Jianxiong Xiao. Multi-view
self-supervised deep learning for 6d pose estimation in the amazon
picking challenge. arXiv preprint arXiv:1609.09475, 2016.
[60] Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav
Gupta, Li Fei-Fei, and Ali Farhadi. Target-driven visual navigation
in indoor scenes using deep reinforcement learning. arXiv preprint
arXiv:1609.05143, 2016.