机器人视觉抓取论文及代码资源

最新推荐文章于 2025-03-04 19:05:32 发布

light169

最新推荐文章于 2025-03-04 19:05:32 发布

阅读量6.4k

点赞数 13

本文链接：https://blog.csdn.net/light169/article/details/105100416

版权

本文综述了近年来基于视觉的机器人抓取技术，涵盖二维平面抓取和六自由度抓取方法，包括利用深度学习进行目标定位、姿态估计、抓取检测及运动规划的最新进展。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Vision-based Robotic Grasping: Papers and Codes

According to the kinds of grasp, the methods of vision-based robotic grasping can be roughly divided into two kinds, 2D planar grasp and 6DoF Grasp. This repository summaries these methods in recent years, which utilize deep learning mostly. Before this summary, previous review papers are also reviewed.

0. Review Papers

[arXiv] 2019-Deep Learning for 3D Point Clouds: A Survey, [paper]

[arXiv] 2019-A Review of Robot Learning for Manipulation- Challenges, Representations, and Algorithms, [paper]

[arXiv] 2019-Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review, [paper]

[MTI] 2018-Review of Deep Learning Methods in Robotic Grasp Detection, [paper]

[ToR] 2016-Data-Driven Grasp Synthesis - A Survey, [paper]

[RAS] 2012-An overview of 3D object grasp synthesis algorithms - A Survey, [paper]

1. 2D Planar Grasp

Grasp Representation: The grasp is represented as an oriented 2D box, and the grasp is constrained from one direction.

1.1 RGB or RGB-D based methods

This kind of methods directly regress the oriented 2D box from RGB or RGB-D images. When using RGB-D images, the depth image is regarded as an another channel, which is similar with RGB-based methods.

2020:

[arXiv] Online Self-Supervised Learning for Object Picking: Detecting Optimum Grasping Position using a Metric Learning Approach, [paper]

[arXiv] A Multi-task Learning Framework for Grasping-Position Detection and Few-Shot Classification, [paper]

[arXiv] Rigid-Soft Interactive Learning for Robust Grasping*, [paper]

[arXiv] Optimizing Correlated Graspability Score and Grasp Regression for Better Grasp Prediction, [paper]

[arXiv] Domain Independent Unsupervised Learning to grasp the Novel Objects, [paper]

[arXiv] Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered Environment, [paper]

[arXiv] Semi-supervised Grasp Detection by Representation Learning in a Vector Quantized Latent Space, [paper]

2019:

[arXiv] Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network, [paper]

[IROS] Domain Independent Unsupervised Learning to grasp the Novel Objects, [paper]

[Sensors] Vision for Robust Robot Manipulation, [paper]

[arXiv] Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly, [paper] [code]

[IROS] GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments, [paper]

[arXiv] Efficient Fully Convolution Neural Network for Generating Pixel Wise Robotic Grasps With High Resolution Images, [paper]

[arXiv] A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection, [paper]

[IROS] ROI-based Robotic Grasp Detection for Object Overlapping Scenes, [paper]

[IROS] SilhoNet: An RGB Method for 6D Object Pose Estimation, [paper]

[ICRA] Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter, [paper] [code]

2018:

[arXiv] Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Networks with High-Resolution Images, [paper]

[arXiv] Real-world Multi-object, Multi-grasp Detection, [paper]

[ICRA] Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching, [paper] [code]

2017:

[IROS] Robotic Grasp Detection using Deep Convolutional Neural Networks, [paper]

2016:

[ICRA] Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours, [paper]

2015:

[ICRA] Real-time grasp detection using convolutional neural networks, [paper] [code]

2014:

[IJRR] Deep Learning for Detecting Robotic Grasps, [paper]

Datasets:

Cornell dataset, the dataset consists of 1035 images of 280 different objects.

1.2 Depth-based methods

This kind of methods utilized an indirectly way to obtain the grasp pose, which contains grasp candidate generation and grasp quality evaluation. The candidate grasp with the highly score will be selected as the final grasp.

2019:

[IROS] GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier, [paper]

[ICRA] Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter, [paper]

[ICRA] MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network, [paper]

[IROS] GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter, [paper]

2018:

[RSS] Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach, [paper]

[BMVC] EnsembleNet Improving Grasp Detection using an Ensemble of Convolutional Neural Networks, [paper]

2017:

[RSS] Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics, [paper] [code]

Dataset:

Dex-Net, a synthetic dataset of 6.7 million point clouds, grasps, and robust analytic grasp metrics generated from thousands of 3D models.

Jacquard Dataset, Jacquard: A Large Scale Dataset for Robotic Grasp Detection” in IEEE International Conference on Intelligent Robots and Systems, 2018, [paper]

1.3 Target object localization in 2D

In order to provide a better input to compute the oriented 2D box, or generate the candidates, the targe object's mask should be computed. The current deep learning-based 2D detection or 2D segmentation methods could assist.

1.3.1 2D detection:

Detailed paper lists can refer to hoya012 or amusi.

Survey papers

2020:

[arXiv] Deep Domain Adaptive Object Detection: a Survey, [paper]

[IJCV] Deep Learning for Generic Object Detection: A Survey, [paper]

2019:

[arXiv] Object Detection in 20 Years A Survey, [paper]

[arXiv] Object Detection with Deep Learning: A Review, [paper]

[arXiv] A Review of Object Detection Models based on Convolutional Neural Network, [paper]

[arXiv] A Review of methods for Textureless Object Recognition, [paper]

a. Two-stage methods

2020:

[arXiv] Any-Shot Object Detection, [paper]

[arXiv] Frustratingly Simple Few-Shot Object Detection, [paper]

[arXiv] Rethinking the Route Towards Weakly Supervised Object Localization, [paper]

[arXiv] Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN, [paper]

[arXiv] Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images, [paper]

[arXiv] PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation, [paper]

[arXiv] SpotNet: Self-Attention Multi-Task Network for Object Detection, [paper]

[arXiv] Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning, [paper]

[arXiv] FedVision: An Online Visual Object Detection Platform Powered by Federated Learning, [paper]

2019:

[arXiv] Combining Deep Learning and Verification for Precise Object Instance Detection, [paper]

[arXiv] cmSalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks, [paper]

[arXiv] OpenLORIS-Object: A Dataset and Benchmark towards Lifelong Object Recognition, [paper] [project]

[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]

[IROS] Recurrent Convolutional Fusion for RGB-D Object Recognition, [paper] [code]

[ICCVW] An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection, [paper]

2017:

[arXiv] Light-Head R-CNN: In Defense of Two-Stage Object Detector, [paper] [code]

2016:

[NeurIPS] R-FCN: Object Detection via Region-based Fully Convolutional Networks, [paper] [code]

[TPAMI] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, [paper] [code]

[ECCV] Visual relationship detection with language priors, [paper]

2015:

[ICCV] Fast R-CNN, [paper] [code]

2014:

[ECCV] SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, [paper] [code]

[CVPR] R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation, [paper] [code]

b. Single-stage methods

2020:

[arXiv] CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection, [paper]

[arXiv] Extended Feature Pyramid Network for Small Object Detection, [paper]

[arXiv] Real Time Detection of Small Objects, [paper]

[arXiv] OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features, [paper]

2019:

[arXiv] CenterNet: Objects as Points, [paper]

[arXiv] CenterNet: Keypoint Triplets for Object Detection, [paper]

[ECCV] CornerNet: Detecting Objects as Paired Keypoints, [paper]

[arXiv] FCOS: Fully Convolutional One-Stage Object Detection, [paper]

[arXiv] Bottom-up Object Detection by Grouping Extreme and Center Points, [paper]

2018:

[arXiv] YOLOv3: An Incremental Improvement, [paper] [code]

2017:

[CVPR] YOLO9000: Better, Faster, Stronger, [paper] [code]

2016:

[CVPR] YOLO: You only look once: Unified, real-time object detection, [paper] [code]

[ECCV] SSD: Single Shot MultiBox Detector, [paper] [code]

[ECCV] LIFT: Learned Invariant Feature Transform, [paper]

2015:

[CVPR] MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching, [paper]

2014:

[ICLR] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, [paper] [code]

Dataset:

PASCAL VOC: The PASCAL Visual Object Classes (VOC) Challenge, [paper]

ILSVRC: ImageNet large scale visual recognition challenge, [paper]

Microsoft COCO: Common Objects in Context, is a large-scale object detection, segmentation, and captioning dataset, [paper]

Open Images: a collaborative release of ~9 million images annotated with labels spanning thousands of object categories, [paper]

1.3.2 2D instance segmentation:

2020:

[arXiv] Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection, [paper]

[arXiv] Weakly-Supervised Salient Object Detection via Scribble Annotations, [paper]

[arXiv] 1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation, [paper]

[arXiv] Deep Affinity Net: Instance Segmentation via Affinity, [paper]

[arXiv] PointINS: Point-based Instance Segmentation, [paper]

[arXiv] Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection, [paper]

[arXiv] Highly Efficient Salient Object Detection with 100K Parameters, [paper]

[arXiv] Conditional Convolutions for Instance Segmentation, [paper]

[arXiv] Global Context-Aware Progressive Aggregation Network for Salient Object Detection, [paper]

[arXiv] Fully Convolutional Networks for Automatically Generating Image Masks to Train Mask R-CNN, [paper]

[arXiv] Cross-layer Feature Pyramid Network for Salient Object Detection, [paper]

[arXiv] Towards Bounding-Box Free Panoptic Segmentation, [paper]

[arXiv] Self-Supervised Object-in-Gripper Segmentation from Robotic Motions, [paper]

[arXiv] Real-time Semantic Background Subtraction, [paper]

[arXiv] Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey, [paper]

[arXiv] FourierNet: Compact mask representation for instance segmentation using differentiable shape decoders, [paper]

[arXiv] Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data, [paper]

[arXiv] Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects, [paper]

[arXiv] Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter, [paper]

[arXiv] PointRend: Image Segmentation as Rendering, [paper]

[arXiv] Image Segmentation Using Deep Learning: A Survey, [paper]

2019:

[arXiv] CenterMask:Real-Time Anchor-Free Instance Segmentation, [paper] [code]

[arXiv] SAIS: Single-stage Anchor-free Instance Segmentation, [paper]

[arXiv] YOLACT++ Better Real-time Instance Segmentation, [paper] [code]

[ICCV] YOLACT: Real-time Instance Segmentation, [paper] [code]

[ICCV] TensorMask: A Foundation for Dense Object Segmentation, [paper] [code]

[CASE] Deep Workpiece Region Segmentation for Bin Picking, [paper]

2018:

[CVPR] PANet: Path Aggregation Network for Instance Segmentation, [paper] [code]

[CVPR] MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features, [paper]

2017:

[ICCV] Mask r-cnn, [paper] [code]

[IROS] SegICP: Integrated Deep Semantic Segmentation and Pose Estimation, [paper]

[CVPR] Fully Convolutional Instance-aware Semantic Segmentation, [paper]

2016:

[ECCV] SharpMask: Learning to Refine Object Segments, [paper] [code]

[BMVC] MultiPathNet: A MultiPath Network for Object Detection, [paper] [code]

[CVPR] MNC: Instance-aware Semantic Segmentation via Multi-task Network Cascades, [paper]

2015:

[NeurIPS] DeepMask: Learning to Segment Object Candidates, [paper] [code]

[CVPR] Hypercolumns for Object Segmentation and Fine-grained Localization, [paper]

2014:

[ECCV] SDS: Simultaneous Detection and Segmentation, [paper]

1.3.3 2D panoptic segmentation:

2020:

[arXiv] Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation, [paper]

2019:

[CVPR] An End-to-End Network for Panoptic Segmentation, [paper]

[CVPR] Panoptic Segmentation, [paper]

[CVPR] Panoptic Feature Pyramid Networks, [paper]

[CVPR] UPSNet: A Unified Panoptic Segmentation Network, [paper]

[IV] Single Network Panoptic Segmentation for Street Scene Understanding, [paper] [code]

[ITSC] Multi-task Network for Panoptic Segmentation in Automated Driving, [paper]

2. 6DoF Grasp

Grasp Representation: The grasp is represented as 6DoF pose in 3D domain, and the gripper can grasp the object from various angles. The input to this task is 3D point cloud from RGB-D sensors, and this task contains two stages. In the first stage, the target object should be extracted from the scene. In the second stage, if there exists an existing 3D model, the 6D pose of the object could be computed. If there exists no 3D models, the 6DoF grasp pose will be computed from some other methods.

2.1 Target object extraction in 3D

The staightforward way is to conduct 2D dection or segmentation, and utilize the point cloud from the corresponding depth area. This part is already related in section 1.3. In the following, only 3D detection and 3D instance segmentation will be summarized.

2.1.1 3D detection

This kind of methods can be divided into three kinds: RGB-based methods, point cloud-based methods, and fusion methods which consume images and point cloud. Most of these works are focus on autonomous driving.

a. RGB-based methods

Most of this kind of methods estimate depth images from RGB images, and then conduct 3D detection.

2020:

[arXiv] Confidence Guided Stereo 3D Object Detection with Split Depth Estimation, [paper]

[arXiv] Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras, [paper]

[arXiv] ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection, [paper]

[arXiv] MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships, [paper]

[arXiv] Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image, [paper]

[arXiv] SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation, [paper]

[arXiv] siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection, [paper]

[AAAI] Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation, [paper]

[arXiv] SDOD: Real-time Segmenting and Detecting 3D Objects by Depth, [paper]

[arXiv] DSGN: Deep Stereo Geometry Network for 3D Object Detection, [paper]

[arXiv] RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving, [paper]

2019:

[NeurIPS] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points, [paper]

[arXiv] Single-Stage Monocular 3D Object Detection with Virtual Cameras, [paper]

[arXiv] Environment reconstruction on depth images using Generative Adversarial Networks, [paper] [code]

[arXiv] Learning Depth-Guided Convolutions for Monocular 3D Object Detection, [paper]

[arXiv] RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving, [paper]

[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]

[arXiv] Task-Aware Monocular Depth Estimation for 3D Object Detection, [paper]

[CVPR] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, [paper] [code]

[AAAI] MonoGRNet: A Geometric Reasoning Network for 3D Object Localization, [paper] [code]

[ICCV] Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving, [paper]

[ICCV] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection, [paper]

[ICCVW] Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud, [paper]

[arXiv] Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss, [paper]

[arXiv] Monocular 3D Object Detection via Geometric Reasoning on Keypoints, [paper]

b. Point cloud-based methods

This kind of methods purely utilize the 3D point cloud data.

2020:

[arXiv] 3D Object Detection From LiDAR Data Using Distance Dependent Feature Extraction, [paper]

[arXiv] HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection, [paper]

[arXiv] Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, [paper]

[arXiv] PointTrackNet: An End-to-End Network for 3-D Object Detection and Tracking from Point Clouds, [paper]

[arXiv] 3DSSD: Point-based 3D Single Stage Object Detector, [paper]

[ariv] SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud, [paper]

[arXiv] Investigating the Importance of Shape Features, Color Constancy, Color Spaces and Similarity Measures in Open-Ended 3D Object Recognition, [paper]

[arXiv] Probabilistic 3D Multi-Object Tracking for Autonomous Driving, [paper]

[AAAI] TANet: Robust 3D Object Detection from Point Clouds with Triple Attention, [paper]

2019:

[arXiv] Class-balanced grouping and sampling for point cloud 3d object detection, [paper] [code]

[arXiv] SESS: Self-Ensembling Semi-Supervised 3D Object Detection, [paper]

[arXiv] Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data, [paper]

[arXiv] Pillar in Pillar: Multi-Scale and Dynamic Feature Extraction for 3D Object Detection in Point Clouds, [paper]

[arXiv] What You See is What You Get: Exploiting Visibility for 3D Object Detection, [paper]

[NeurIPSW] Patch Refinement -- Localized 3D Object Detection, [paper]

[CoRL] End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds, [paper]

[ICCV] Deep Hough Voting for 3D Object Detection in Point Clouds, [paper] [code]

[arXiv] Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud, [paper]

[ICCV] STD: Sparse-to-Dense 3D Object Detector for Point Cloud, [paper]

[CVPR] PointPillars: Fast Encoders for Object Detection from Point Clouds, [paper]

[arXiv] StarNet: Targeted Computation for Object Detection in Point Clouds, [paper]

2018:

[CVPR] PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, [paper] [code]

[CVPR] PIXOR: Real-time 3D Object Detection from Point Clouds, [paper] [code]

[ECCVW] Complex-YOLO: Real-time 3D Object Detection on Point Clouds, [paper] [code]

[ECCVW] YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud, [paper]

c. Fusion methods

This kind of methods utilize both rgb images and depth images/point clouds. There exist early fusion methods, late fusion methods, and dense fusion methods.

2020:

[arXiv] ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes, [paper]

[arXiv] JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset, [paper]

[AAAI] PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module, [paper]

2019:

[arXiv] PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection, [paper]

[arXiv] Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots, [paper]

[arXiv] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language, [paper]

[arXiv] Relation Graph Network for 3D Object Detection in Point Clouds, [paper]

[arXiv] PointPainting: Sequential Fusion for 3D Object Detection, [paper]

[ICCV] Transferable Semi-Supervised 3D Object Detection From RGB-D Data, [paper]

[arXiv] Adaptive and Azimuth-Aware Fusion Network of Multimodal Local Features for 3D Object Detection, [paper]

[arXiv] Frustum VoxNet for 3D object detection from RGB-D or Depth images, [paper]

[IROS] Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection, [paper]

[CVPR] Multi-Task Multi-Sensor Fusion for 3D Object Detection, [paper]

2018:

[CVPR] Frustum PointNets for 3D Object Detection from RGB-D Data, [paper] [code]

[ECCV] Deep Continuous Fusion for Multi-Sensor 3D Object Detection, [paper]

[IROS] Joint 3D Proposal Generation and Object Detection from View Aggregation, [paper] [code]

[CVPR] PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation, [paper]

[ICRA] A General Pipeline for 3D Detection of Vehicles, [paper]

2017: [CVPR] Multi-View 3D Object Detection Network for Autonomous Driving, [paper] [code]

2.1.2 3D segmentation

2020:

[arXiv] OccuSeg: Occupancy-aware 3D Instance Segmentation, [paper]

[arXiv] Learning to Segment 3D Point Clouds in 2D Image Space, [paper]

[arXiv] Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds, [paper]

[arXiv] 3DCFS: Fast and Robust Joint 3D Semantic-Instance Segmentation via Coupled Feature Selection, [paper]

[arXiv] SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor, [paper]

[RAL] From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds, [paper]

[arXiv] Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation, [paper]

[AAAI] JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds, [paper] [code]

[WACV] FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data, [paper]

2019:

[CVPR] SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation, [paper]

[arXiv] Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling, [paper]

[arXiv] LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices, [paper]

[arXiv] Learning to Optimally Segment Point Clouds, [paper]

[arXiv] Point Cloud Instance Segmentation using Probabilistic Embeddings, [paper]

[NeurIPS] Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations, [paper]

[arXiv] Addressing the Sim2Real Gap in Robotic 3D Object Classification, [paper]

[NeurIPS] 3D-BoNet: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds, [paper] [code]

[IROS] LDLS: 3-D Object Segmentation Through Label Diffusion From 2-D Images, [paper]

[arXiv] GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud, [paper]

[CoRL] The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation, [paper] [code]

[IJARS] Fast geometry-based computation of grasping points on three-dimensional point clouds, [paper]

2018:

[arXiv] PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation, [paper]

2.1.3 3D deep learning networks

Some of these works are cited from awesome-point-cloud-analysis by Yongcheng Liu, thank him.

2020:

[arXiv] Review: deep learning on 3D point clouds, [paper]

[arXiv] Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors, [paper]

2019:

[arXiv] QUATERNION EQUIVARIANT CAPSULE NETWORKS FOR 3D POINT CLOUDS, [paper]

[arXiv] Geometry Sharing Network for 3D Point Cloud Classification and Segmentation, [paper]

[arXiv] Geometric Capsule Autoencoders for 3D Point Clouds, [paper]

[arXiv] Utility Analysis of Network Architectures for 3D Point Cloud Processing, [paper]

[arXiv] Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research, [paper] [code]

[ICCV] DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing, [paper] [code]

[TOG] Dynamic Graph CNN for Learning on Point Clouds, [paper] [code]

[ICCV] DeepGCNs: Can GCNs Go as Deep as CNNs?, [paper] [code]

[ICCV] KPConv: Flexible and Deformable Convolution for Point Clouds, [paper] [code]

[MM] SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation, [paper]

[CVPR] PointConv: Deep Convolutional Networks on 3D Point Clouds, [paper] [code]

[CVPR] PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing, [paper] [code]

[CVPR] Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN, [paper] [code]

[arXiv] SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing, [paper]

[arXiv] PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation, [paper]

[ICCV] Interpolated Convolutional Networks for 3D Point Cloud Understanding, [paper]

[arXiv] A survey on Deep Learning Advances on Different 3D Data Representations, [paper]

2018:

[TOG] MCCNN: Monte Carlo Convolution for Learning on Non-Uniformly Sampled Point Clouds, [paper] [code]

[NeurIPS] PointCNN: Convolution On X-Transformed Points, [paper] [code]

[CVPR] Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling, [paper] [code]

[CVPR] SO-Net: Self-Organizing Network for Point Cloud Analysis, [paper] [code]

[CVPR] SPLATNet: Sparse Lattice Networks for Point Cloud Processing, [paper] [code]

[arXiv] Point Convolutional Neural Networks by Extension Operators, [paper]

2017:

[ICCV] Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models, [paper] [code]

[CVPR] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, [paper] [code]

[NeurIPS] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, [paper] [code]

[CVPR] SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation, [paper]

2.2 6D object pose estimation (Exist 3D models)

2.2.1 RGB-D based methods

This kind of methods can be divided into four kinds, which are corresponding-based methods, template-based methods, voting-based methods and regression-based methods.

2020:

[arXiv] A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators, [paper]

2016:

[ECCVW] A Summary of the 4th International Workshop on Recovering 6D Object Pose, [paper]

a. Corresponding-based methods

2020:

[arXiv] LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts, [paper]

[arXiv] Table-Top Scene Analysis Using Knowledge-Supervised MCMC, [paper]

[arXiv] AprilTags 3D: Dynamic Fiducial Markers for Robust Pose Estimation in Highly Reflective Environments and Indirect Communication in Swarm Robotics, [paper]

[AAAI] LCD: Learned Cross-Domain Descriptors for 2D-3D Matching, [paper] [project]

2019:

[CVPR] Segmentation-driven 6D Object Pose Estimation, [paper]

2018:

[arXiv] Estimating 6D Pose From Localizing Designated Surface Keypoints, [paper]

2017:

[ICRA] 6-DoF Object Pose from Semantic Keypoints, [paper]

2012:

[3DIMPVT] 3D Object Detection and Localization using Multimodal Point Pair Features, [paper]

b. Template-based methods

2019:

[arXiv] Real-time Background-aware 3D Textureless Object Pose Estimation, [paper]

2017:

[arXiv] End-to-end Learning of Deep Visual Representations for Image Retrieval, [paper]

2012:

[ACCV] Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, [paper]

c. Voting-based methods

2017:

[TPAMI] Robust 3D Object Tracking from Monocular Images Using Stable Parts, [paper]

2014:

[ECCV] Learning 6d object pose estimation using 3d object coordinate, [paper]

[ECCV] Latent-class hough forests for 3d object detection and pose estimation, [paper]

d. Regression-based methods

1) Directly way

2020:

[arXiv] Neural Mesh Refiner for 6-DoF Pose Estimation, [paper]

[arXiv] MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision, [paper]

[arXiv] Robust 6D Object Pose Estimation by Learning RGB-D Features, [paper]

[arXiv] 6D Object Pose Regression via Supervised Learning on Point Clouds, [paper]

[arXiv] HybridPose: 6D Object Pose Estimation under Hybrid Representations, [paper]

2019:

[arXiv] P2GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation, [paper]

[arXiv] ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation, [paper]

[arXiv] PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds, [paper]

[RSS] PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, [paper]

[arXiv] Multi-View Matching Network for 6D Pose Estimation, [paper]

[arXiv] Single-Stage 6D Object Pose Estimation, [paper]

[arXiv] Fast 3D Pose Refinement with RGB Images, [paper]

[arXiv] MaskedFusion: Mask-based 6D Object Pose Detection, [paper]

[CoRL] Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects, [paper]

[IROS] Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images, [paper]

[IROSW] Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB, [paper]

[ICCV] DPOD: 6D Pose Object Detector and Refiner, [paper]

[ICCV] Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, [paper]

[ICCV] Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data, [paper]

[arXiv] Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning, [paper]

[arXiv] Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction, [paper]

[arXiv] Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images, [paper]

[ICHR] Refining 6D Object Pose Predictions using Abstract Render-and-Compare, [paper]

[CVPR] Densefusion: 6d object pose estimation by iterative dense fusion, [paper] [code]

[arXiv] Deep-6dpose: recovering 6d object pose from a single rgb image, [paper]

2018:

[ECCV] Implicit 3D Orientation Learning for 6D Object Detection From RGB Images, [paper] [code]

[ECCV] DeepIM:Deep Iterative Matching for 6D Pose Estimation [paper] [code]

[RSS] Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes, [paper] [code]

[IROS] Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks, [paper]

2017:

[ICCV] SSD-6D: Making rgb-based 3d detection and 6d pose estimation great again, [paper] [code]

2) Indirectly way (Firstly regress feature points and use PnP methods)

2020:

[arXiv] Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem, [paper]

[arXiv] PnP-Net: A hybrid Perspective-n-Point Network, [paper]

[arXiv] Object 6D Pose Estimation with Non-local Attention, [paper]

[arXiv] 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss, [paper]

[arXiv] YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation, [paper]

2019:

[arXiv] DPOD: 6D Pose Object Detector and Refiner, [paper]

[arXiv] W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression, [paper]

[arXiv] KeyPose: Multi-view 3D Labeling and Keypoint Estimation for Transparent Objects, [paper]

[arXiv] PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation, [paper]

[ICCV] CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation, [paper]

[CVPR] PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation, [paper] [code]

2018:

[CVPR] Real-time seamless single shot 6d object pose prediction, [paper] [code]

2017:

[ICCV] BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, [paper]

e. Category-level 6D pose estimation methods

2020:

[arXiv] CPS: Class-level 6D Pose and Shape Estimation From Monocular Images, [paper]

[arXiv] Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation, [paper]

2019:

[arXiv] Category-Level Articulated Object Pose Estimation, [paper]

[arXiv] LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation, [paper]

[arXiv] 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, [paper] [code]

[arXiv] Self-Supervised 3D Keypoint Learning for Ego-motion Estimation, [paper]

[CVPR] Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation, [paper] [code]

[arXiv] Instance- and Category-level 6D Object Pose Estimation, [paper]

[arXiv] kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation, [paper]

f. 3D shape reconstruction from images

2020:

[arXiv] Instant recovery of shape from spectrum via latent space connections, [paper]

[arXiv] Self-supervised Single-view 3D Reconstruction via Semantic Consistency, [paper]

[arXiv] Meta3D: Single-View 3D Object Reconstruction from Shape Priors in Memory, [paper]

[arXiv] STD-Net: Structure-preserving and Topology-adaptive Deformation Network for 3D Reconstruction from a Single Image, [paper]

[arXiv] Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data, [paper]

[arXiv] Deep NRSfM++: Towards 3D Reconstruction in the Wild, [paper]

[arXiv] Learning to Correct 3D Reconstructions from Multiple Views, [paper]

2019:

[arXiv] Boundary Cues for 3D Object Shape Recovery, [paper]

[arXiv] Learning to Generate Dense Point Clouds with Textures on Multiple Categories, [paper]

[arXiv] Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction, [paper]

[arXiv] Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision, [paper]

[arXiv] SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization, [paper]

[arXiv] 3D-GMNet: Learning to Estimate 3D Shape from A Single Image As A Gaussian Mixture, [paper]

[arXiv] Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction, [paper]

g. 3D shape rendering

2019:

[arXiv] SynSin: End-to-end View Synthesis from a Single Image, [paper] [project]

[arXiv] Neural Point Cloud Rendering via Multi-Plane Projection, [paper]

[arXiv] Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool, [paper]

Datasets:

HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects, ICCVW, 2019 [paper]

YCB Datasets: The YCB Object and Model Set: Towards Common Benchmarks for Manipulation Research, IEEE International Conference on Advanced Robotics (ICAR), 2015 [paper]

T-LESS Datasets: T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects, IEEE Winter Conference on Applications of Computer Vision (WACV), 2017 [paper]

2.2.2 3D point cloud

The partial-view point cloud will be aligned to the complete shape in order to obtain the 6D pose. Generally, coarse registration should be conduct firstly to provide an intial alignment, and dense registration methods like ICP (Iterative Closest Point) will be conducted to obtain the final 6D pose.

Survey

2020:

[arXiv] When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs), [paper]

[arXiv] Least Squares Optimization: from Theory to Practice, [paper]

a. Ransac-based methods

2020:

[arXiv] Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands, [paper]

[arXiv] Non-iterative One-step Solution for Point Set Registration Problem on Pose Estimation without Correspondence, [paper]

2016:

[TPAMI] Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration, [paper] [code]

2014:

[SGP] Super 4PCS Fast Global Pointcloud Registration via Smart Indexing, [paper] [code]

b. 3D feature-based methods

2020:

[arXiv] End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds, [paper]

[arXiv] D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features, [paper]

[arXiv] Self-supervised Point Set Local Descriptors for Point Cloud Registration, [paper]

[arXiv] StickyPillars: Robust feature matching on point clouds using Graph Neural Networks, [paper]

2019:

[arXiv] 3DRegNet: A Deep Neural Network for 3D Point Registration, [paper] [code]

[CVPR] The Perfect Match: 3D Point Cloud Matching with Smoothed Densities, [paper]

2018:

[arXiv] Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation, [paper]

[ECCV] 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration, [paper] [code]

2017:

[CVPR] 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions, [paper] [code]

2016:

[arXiv] Lessons from the Amazon Picking Challenge, [paper]

[arXiv] Team Delft's Robot Winner of the Amazon Picking Challenge 2016, [paper]

c. Deep learning-based methods

2020:

[arXiv] TEASER: Fast and Certifiable Point Cloud Registration, [paper] [code]

[arXiv] Plane Pair Matching for Efficient 3D View Registration, [paper]

[arXiv] LRF-Net: Learning Local Reference Frames for 3D Local Shape Description and Matching, [paper]

[arXiv] Learning multiview 3D point cloud registration, [paper]

2019:

[arXiv] One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment, [paper]

[arXiv] DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration, [paper]

[NeurIPS] PRNet: Self-Supervised Learning for Partial-to-Partial Registration, [paper]

[CVPR] PointNetLK: Robust & Efficient Point Cloud Registration using PointNet, [paper] [code]

[ICCV] End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans, [paper]

[arXiv] Iterative Matching Point, [paper]

[arXiv] Deep Closest Point: Learning Representations for Point Cloud Registration, [paper] [code]

[arXiv] PCRNet: Point Cloud Registration Network using PointNet Encoding, [paper] [code]

2017:

[ICRA] Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge, [paper] [code]

d. Point cloud de-noising

2019:

[arXiv] CNN-based Lidar Point Cloud De-Noising in Adverse Weather, [paper]

e. Point cloud sampling

2019:

[arXiv] SampleNet: Differentiable Point Cloud Sampling, [paper] [code]

2.3 Deep learning-based methods (No existing 3D models)

In this situation, there exist no 3D models, an the 6-DoF grasps are estimated from available partial data. This can be implemented by directly estimating from partial view point cloud, or indirectly estimating after shape completion.

2.3.1 Estimating 6-DoF grasps from partial view point cloud

2020:

[arXiv] EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation, [paper]

[ariXiv] REGNet: REgion-based Grasp Network for Single-shot Grasp Detection in Point Clouds, [paper]

[RAL] GRASPA 1.0: GRASPA is a Robot Arm graSping Performance benchmArk, [paper] [code]

[arXiv] GraspNet: A Large-Scale Clustered and Densely Annotated Dataset for Object Grasping, [paper]

2019:

[ISRR] A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set, [paper] [project]

[arXiv] 6-DOF Grasping for Target-driven Object Manipulation in Clutter, [paper]

[IROS] Grasping Unknown Objects Based on Gripper Workspace Spheres, [paper]

[arXiv] Learning to Generate 6-DoF Grasp Poses with Reachability Awareness, [paper]

[CoRL] S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes, [paper]

[ICCV] 6-DoF GraspNet: Variational Grasp Generation for Object Manipulation, [paper]

[ICRA] PointNetGPD: Detecting Grasp Configurations from Point Sets, [paper] [code]

2017:

[IJRR] Grasp Pose Detection in Point Clouds, [paper] [code]

2.3.2 Grasp affordance

2020:

[arXiv] Learning to Grasp 3D Objects using Deep Residual U-Nets, [paper]

2019:

[IROS] Detecting Robotic Affordances on Novel Objects with Regional Attention and Attributes, [paper]

[IROS] Learning Grasp Affordance Reasoning through Semantic Relations, [paper]

[arXiv] Automatic pre-grasps generation for unknown 3D objects, [paper]

[IECON] A novel object slicing based grasp planner for 3D object grasping using underactuated robot gripper, [paper]

2018:

[ICRA] AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection, [paper]

[arXiv] Workspace Aware Online Grasp Planning, [paper]

2.3.3 Shape completion assisted grasp

2020:

[arXiv] PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions, [paper]

[arXiv] Multimodal Shape Completion via Conditional Generative Adversarial Networks, [paper]

[arXiv] Symmetry Detection of Occluded Point Cloud Using Deep Learning, [paper]

[arXiv] Robotic Grasping through Combined Image-Based Grasp Proposal and 3D Reconstruction, [paper]

[arXiv] PF-Net: Point Fractal Network for 3D Point Cloud Completion, [paper]

[arXiv] Hypernetwork approach to generating point clouds, [paper]

[arXiv] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion, [paper]

[arXiv] PolyGen: An Autoregressive Generative Model of 3D Meshes, [paper]

[arXiv] BlockGAN Learning 3D Object-aware Scene Representations from Unlabelled Images, [paper]

[arXiv] Implicit Geometric Regularization for Learning Shapes, [paper]

[arXiv] The Whole Is Greater Than the Sum of Its Nonrigid Parts, [paper]

2019:

[arXiv] ClearGrasp- 3D Shape Estimation of Transparent Objects for Manipulation, [paper]

[arXiv] kPAM-SC: Generalizable Manipulation Planning using KeyPoint Affordance and Shape Completion, [paper] [code]

[arXiv] Data-Efficient Learning for Sim-to-Real Robotic Grasping using Deep Point Cloud Prediction Networks, [paper]

[arXiv] Inferring Occluded Geometry Improves Performance when Retrieving an Object from Dense Clutter, [paper]

[IROS] Robust Grasp Planning Over Uncertain Shape Completions, [paper]

[arXiv] Multi-Modal Geometric Learning for Grasping and Manipulation, [paper]

2018:

[ICRA] Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations, [paper]

[IROS] 3D Shape Perception from Monocular Vision, Touch, and Shape Priors, [paper]

2016:

[IROS] Shape Completion Enabled Robotic Grasping, [paper]

2.3.4 Depth completion and Estimation

2020:

[arXiv] 3dDepthNet: Point Cloud Guided Depth Completion Network for Sparse Depth and Single Color Image, [paper]

[arXiv] Depth Estimation by Learning Triangulation and Densification of Sparse Points for Multi-view Stereo, [paper]

[arXiv] Monocular Depth Estimation Based On Deep Learning: An Overview, [paper]

[arXiv] Scene Completenesss-Aware Lidar Depth Completion for Driving Scenario, [paper]

[arXiv] Fast Depth Estimation for View Synthesis, [paper]

[arXiv] Active Depth Estimation: Stability Analysis and its Applications, [paper]

[arXiv] Uncertainty depth estimation with gated images for 3D reconstruction, [paper]

[arXiv] Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry, [paper]

[arXiv] A-TVSNet: Aggregated Two-View Stereo Network for Multi-View Stereo Depth Estimation, [paper]

[arXiv] Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields, [paper]

[ICLR] SEMANTICALLY-GUIDED REPRESENTATION LEARNING FOR SELF-SUPERVISED MONOCULAR DEPTH, [paper]

[arXiv] 3D Gated Recurrent Fusion for Semantic Scene Completion, [paper]

[arXiv] Applying Depth-Sensing to Automated Surgical Manipulation with a da Vinci Robot, [paper]

[arXiv] Fast Generation of High Fidelity RGB-D Images by Deep-Learning with Adaptive Convolution, [paper]

[arXiv] DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data, [paper]

[arXiv] Depth Map Estimation of Dynamic Scenes Using Prior Depth Information, [paper]

[arXiv] FIS-Nets: Full-image Supervised Networks for Monocular Depth Estimation, [paper]

[ICRA] Depth Based Semantic Scene Completion with Position Importance Aware Loss, [paper]

[arXiv] ResDepth: Learned Residual Stereo Reconstruction, [paper]

[arXiv] Single Image Depth Estimation Trained via Depth from Defocus Cues, [paper]

[arXiv] RoutedFusion: Learning Real-time Depth Map Fusion, [paper]

[arXiv] Don't Forget The Past: Recurrent Depth Estimation from Monocular Video, [paper]

[AAAI] Morphing and Sampling Network for Dense Point Cloud Completion, [paper] [code]

[AAAI] CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion, [paper]

2019:

[arXiv] Normal Assisted Stereo Depth Estimation, [paper]

[arXiv] GEOMETRY-AWARE GENERATION OF ADVERSARIAL AND COOPERATIVE POINT CLOUDS, [paper]

[arXiv] DeepSFM: Structure From Motion Via Deep Bundle Adjustment, [paper]

[CVIU] On the Benefit of Adversarial Training for Monocular Depth Estimation, [paper]

[ICCV] Learning Joint 2D-3D Representations for Depth Completion, [paper]

[ICCV] Deep Optics for Monocular Depth Estimation and 3D Object Detection, [paper]

[arXiv] Deep Classification Network for Monocular Depth Estimation, [paper]

[ICCV] Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints, [paper]

[arXiv] Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era, [paper]

[arXiv] Real-time Vision-based Depth Reconstruction with NVidia Jetson, [paper]

[IROS] Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics, [paper]

[arXiv] Mesh R-CNN, [paper]

[arXiv] Monocular depth estimation: a survey, [paper]

2018:

[3DV] PCN: Point Completion Network, [paper] [code]

[NeurIPS] Learning to Reconstruct Shapes from Unseen Classes, [paper] [code]

[ECCV] Learning Shape Priors for Single-View 3D Completion and Reconstruction, [paper] [code]

[CVPR] Deep Depth Completion of a Single RGB-D Image, [paper] [code]

2.3.5 Point cloud upsamping and denoising

2020:

[arXiv] Non-Local Part-Aware Point Cloud Denoising, [paper]

[arXiv] PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling, [paper]

2019:

[arXiv] PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks, [paper] [code]

[ICCV] PU-GAN: a Point Cloud Upsampling Adversarial Network, [paper] [code]

[CVPR] Patch-based Progressive 3D Point Set Upsampling, [paper] [code]

2018:

[CVPR] PU-Net: Point Cloud Upsampling Network, [paper] [code]

3. Grasp Transfer

3.1 Task-oriented manipulation

2020:

[arXiv] Development of a Robotic System for Automated Decaking of 3D-Printed Parts, [paper]

[arXiv] Team O2AS at the World Robot Summit 2018: An Approach to Robotic Kitting and Assembly Tasks using General Purpose Grippers and Tools, [paper]

[arXiv] Towards Mobile Multi-Task Manipulation in a Confined and Integrated Environment with Irregular Objects, [paper]

[arXiv] Autonomous Industrial Assembly using Force, Torque, and RGB-D sensing, [[paper](Autonomous Industrial Assembly using Force, Torque, and RGB-D sensing)]

2019:

[arXiv] KETO: Learning Keypoint Representations for Tool Manipulation, [paper]

[arXiv] Learning Task-Oriented Grasping from Human Activity Datasets, [paper]

3.2 Grasp transfer between shape parts

2020:

[arXiv] DGCM-Net: Dense Geometrical Correspondence Matching Network for Incremental Experience-based Robotic Grasping, [paper]

2019:

[arXiv] Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping, [paper]

[ICRA] Transferring Grasp Configurations using Active Learning and Local Replanning, [paper]

2017:

[AIP] Fast grasping of unknown objects using principal component analysis, [paper]

2015:

[RAS] Category-based task specific grasping, [paper]

3.3 Non-rigid shape matching

3.3.1 Non-rigid registration

2020:

[arXiv] MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment, [paper]

2019:

[arXiv] Non-Rigid Point Set Registration Networks, [paper] [code]

2018:

[RAL] Transferring Category-based Functional Grasping Skills by Latent Space Non-Rigid Registration, [paper]

[RAS] Learning Postural Synergies for Categorical Grasping through Shape Space Registration, [paper]

[RAS] Autonomous Dual-Arm Manipulation of Familiar Objects, [paper]

3.3.2 Shape correspondence

2020:

[arXiv] SAPIEN: A SimulAted Part-based Interactive ENvironment, [paper]

[TVCG] Voting for Distortion Points in Geometric Processing, [paper]

[arXiv] SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence, [paper]

2019:

[arXiv] Fine-grained Object Semantic Understanding from Correspondences, [paper]

[IROS] Multi-step Pick-and-Place Tasks Using Object-centric Dense Correspondences, [code]

[arXiv] Unsupervised cycle-consistent deformation for shape matching, [paper]

[arXiv] ZoomOut: Spectral Upsampling for Efficient Shape Correspondence, [paper]

[C&G] Partial correspondence of 3D shapes using properties of the nearest-neighbor field, [paper]

3.4 3D part segmentation

2020:

[ICLR] Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories, [paper]

2019:

[arXiv] Skeleton Extraction from 3D Point Clouds by Decomposing the Object into Parts, [paper]

[arXiv] Neural Shape Parsers for Constructive Solid Geometry, [paper]

[arXiv] PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes, [paper]

[CVPR] PartNet: A Recursive Part Decomposition Network for Fine-grained and Hierarchical Shape Segmentation, [paper] [code]

[C&G] Autoencoder-based part clustering for part-in-whole retrieval of CAD models, [paper]

2016:

[SiggraphAsia] A Scalable Active Framework for Region Annotation in 3D Shape Collections, [paper]

4. Dexterous Grippers

2020:

[arXiv] The State of Service Robots: Current Bottlenecks in Object Perception and Manipulation, [paper]

[arXiv] Selecting and Designing Grippers for an Assembly Task in a Structured Approach, [paper]

[arXiv] A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU, [paper]

[arXiv] Robust High-Transparency Haptic Exploration for Dexterous Telemanipulation, [paper]

[arXiv] Tactile Dexterity: Manipulation Primitives with Tactile Feedback, [paper]

[arXiv] Deep Differentiable Grasp Planner for High-DOF Grippers, [paper]

[arXiv] Multi-Fingered Grasp Planning via Inference in Deep Neural Networks, [paper]

[RAL] Benchmarking In-Hand Manipulation, [paper]

2019:

[arXiv] GraphPoseGAN: 3D Hand Pose Estimation from a Monocular RGB Image via Adversarial Learning on Graphs, [paper]

[arXiv] HMTNet:3D Hand Pose Estimation from Single Depth Image Based on Hand Morphological Topology, [paper]

[arXiv] UniGrasp: Learning a Unified Model to Grasp with N-Fingered Robotic Hands, [paper]

[ScienceRobotics] On the choice of grasp type and location when handing over an object, [paper]

[arXiv] Solving Rubik's Cube with a Robot Hand, [paper]

[IJARS] Fast geometry-based computation of grasping points on three-dimensional point clouds, [paper] [code]

[arXiv] Learning better generative models for dexterous, single-view grasping of novel objects, [paper]

[arXiv] DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System, [paper]

[IROS] Optimization Model for Planning Precision Grasps with Multi-Fingered Hands, [paper]

[IROS] Generating Grasp Poses for a High-DOF Gripper Using Neural Networks, [paper]

[arXiv] Deep Dynamics Models for Learning Dexterous Manipulation, [paper]

[CVPR] Learning joint reconstruction of hands and manipulated objects, [paper]

[CVPR] H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions, [paper]

[IROS] Efficient Grasp Planning and Execution with Multi-Fingered Hands by Surface Fitting, [paper]

[arXiv] Efficient Bimanual Manipulation Using Learned Task Schemas, [paper]

[ICRA] High-Fidelity Grasping in Virtual Reality using a Glove-based System, [paper] [code]

5. Simulation to Reality

2020:

[arXiv] On the Effectiveness of Virtual Reality-based Training for Robotic Setup, [paper]

[arXiv] LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation, [paper]

[arXiv] Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey, [paper]

[arXiv] Learning Machines from Simulation to Real World, [paper]

[arXiv] Sim2Real2Sim: Bridging the Gap Between Simulation and Real-World in Flexible Object Manipulation, [paper]

2019:

[arXiv] Self-supervised 6D Object Pose Estimation for Robot Manipulation, [paper]

[arXiv] Accept Synthetic Objects as Real-End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in Clutter, [paper]

[RSSW] Generative grasp synthesis from demonstration using parametric mixtures, [paper]

2018:

[RSS] Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision, [paper]

[CoRL] Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects, [paper]

[arXiv] Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation, [paper]

2017:

[arXiv] Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping, [paper]

6. Multi-source

2020:

[arXiv] Understanding Contexts Inside Robot and Human Manipulation Tasks through a Vision-Language Model and Ontology System in a Video Stream, [paper]

[ToR] A Transfer Learning Approach to Cross-modal Object Recognition: from Visual Observation to Robotic Haptic Exploration, [paper]

[arXiv] Accurate Vision-based Manipulation through Contact Reasoning, [paper]

2019:

[arXiv] RoboSherlock: Cognition-enabled Robot Perception for Everyday Manipulation Tasks, [paper]

[ICRA] Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, [paper]

[CVPR] ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging, [paper] [code]

2018:

[arXiv] Learning to Grasp without Seeing, [paper]

7. Learning from Demonstration

2020:

[arXiv] SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks, [paper]

[arXiv] A Geometric Perspective on Visual Imitation Learning, [paper]

[arXiv] Vision-based Robot Manipulation Learning via Human Demonstrations, [paper]

[arXiv] Gaussian-Process-based Robot Learning from Demonstration, [paper]

2019:

[arXiv] Grasping in the Wild: Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations, [paper] [project]

[arXiv] Motion Reasoning for Goal-Based Imitation Learning, [paper]

[IROS] Robot Learning of Shifting Objects for Grasping in Cluttered Environments, [paper] [code]

[arXiv] Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control, [paper]

[arXiv] Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video, [paper]

[IROS] Learning Actions from Human Demonstration Video for Robotic Manipulation, [paper]

[RSSW] Generative grasp synthesis from demonstration using parametric mixtures, [paper]

2018:

[arXiv] Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation, [paper]

8. Reinforcement Learning

2020:

[arXiv] Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras, [paper]

[arXiv] The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation, [paper]

[arXiv] Learning Pregrasp Manipulation of Objects from Ungraspable Poses, [paper]

[arXiv] Deep Reinforcement Learning for Autonomous Driving: A Survey, [paper]

[arXiv] Lyceum: An efficient and scalable ecosystem for robot learning, [paper]

[arXiv] Planning an Efficient and Robust Base Sequence for a Mobile Manipulator Performing Multiple Pick-and-place Tasks, [paper]

[arXiv] Reward Engineering for Object Pick and Place Training, [paper]

2019:

[arXiv] Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning, [paper] [project] [code]

[ROBIO] Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning, [paper]

[arXiv] Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies, [paper]

[IROS] Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity, [paper]

[arXiv] IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data, [paper]

[arXiv] Dynamic Cloth Manipulation with Deep Reinforcement Learning, [paper]

[CoRL] Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning, [paper] [project]

[CoRL] Asynchronous Methods for Model-Based Reinforcement Learning, [paper]

[CoRL] Entity Abstraction in Visual Model-Based Reinforcement Learning, [paper]

[CoRL] Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation, [paper] [project]

[arXiv] Contextual Imagined Goals for Self-Supervised Robotic Learning, [paper]

[arXiv] Learning to Manipulate Deformable Objects without Demonstrations, [paper] [project]

[arXiv] A Deep Learning Approach to Grasping the Invisible, [paper]

[arXiv] Knowledge Induced Deep Q-Network for a Slide-to-Wall Object Grasping, [paper]

[arXiv] Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping, [paper]

[arXiv] Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control, [paper]

[arXiv] Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations, [paper]

[arXiv] Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation, [paper]

[arXiv] Object Perception and Grasping in Open-Ended Domains, [paper]

[CoRL] ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, [paper] [code]

[RSS] End-to-End Robotic Reinforcement Learning without Reward Engineering, [paper]

[arXiv] Learning to combine primitive skills: A step towards versatile robotic manipulation, [paper]

[CoRL] A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots, [paper] [code]

[ICCAS] Deep Reinforcement Learning Based Robot Arm Manipulation with Efficient Training Data through Simulation, [paper]

[CVPR] CRAVES: Controlling Robotic Arm with a Vision-based Economic System, [paper] [code]

[Report] A Unified Framework for Manipulating Objects via Reinforcement Learning, [paper]

2018:

[IROS] Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning, [paper] [code]

[CoRL] QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, [paper]

[arXiv] Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods, [paper]

[arXiv] Pick and Place Without Geometric Object Models, [paper]

2017:

[arXiv] Deep Reinforcement Learning for Robotic Manipulation-The state of the art, [paper]

2016:

[IJRR] Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning, [paper]

2013:

[IJRR] Reinforcement learning in robotics: A survey, [paper]

9. Visual servoing

2020:

[arXiv] Predicting Target Feature Configuration of Non-stationary Objects for Grasping with Image-Based Visual Servoing, [paper]

[AAAI] That and There: Judging the Intent of Pointing Actions with Robotic Arms, [paper]

2019:

[arXiv] Camera-to-Robot Pose Estimation from a Single Image, [paper]

[ICRA] Learning Driven Coarse-to-Fine Articulated Robot Tracking, [paper]

[CVPR] Craves: controlling robotic arm with a vision-based, economic system, [paper] [code]

2018:

[arXiv] Point-to-Pose Voting based Hand Pose Estimation using Residual Permutation Equivariant Layer, [paper]

2016:

[ICRA] Robot Arm Pose Estimation by Pixel-wise Regression of Joint Angles, [paper]

2014:

[ICRA] Robot Arm Pose Estimation through Pixel-Wise Part Classification, [paper]

10. Path Planning

2020:

[arXiv] GOMP: Grasp-Optimized Motion Planning for Bin Picking, [paper]

[arXiv] Describing Physics For Physical Reasoning: Force-based Sequential Manipulation Planning, [paper]

[arXiv] Reaching, Grasping and Re-grasping: Learning Fine Coordinated Motor Skills, [paper]

2019:

[arXiv] Manipulation Trajectory Optimization with Online Grasp Synthesis and Selection, [paper]

[arXiv] Parareal with a Learned Coarse Model for Robotic Manipulation, [paper]

11. Experts:

Abhinav Gupta(CMU & FAIR): Robotics, machine learning

Andreas ten Pas(Northeastern University): Robotic Grasping, Deep Learning, Simulation-based Planning

Andy Zeng(Princeton University & Google Brain Robotics): 3D Deep Learning, Robotic Grasping

Animesh Garg(University of Toronto): Robotics, Reinforcement Learning

Cewu Lu(SJTU): Machine Vision

Charles Ruizhongtai Qi(Waymo(Google)): 3D Deep Learning

Danfei Xu(Stanford University): Robotics, Computer Vision

Deter Fox(Nvidia & University of Washington): Robotics, Artificial intelligence, State Estimation

Fei-Fei Li(Stanford University): Computer Vision

Guofeng Zhang(ZJU): 3D Vision, SLAM

Hao Su(UC San Diego): 3D Deep Learning

Jeannette Bohg(Stanford University): perception for autonomous robotic manipulation and grasping

Jianping Shi(SenseTime): Computer Vision

Juxi Leitner(Australian Centre of Excellence for Robotic Vision (ACRV)): Robotic grasping

Lerrel Pinto(UC Berkeley): Robotics

Lorenzo Jamone(Queen Mary University of London (QMUL)): Cognitive Robotics

Lorenzo Natale(Italian Institute of Technology): Humanoid robotic sensing and perception

Kaiming He(Facebook AI Research (FAIR)): Deep Learning

Kai Xu(NUDT): Graphics, Geometry

Ken Goldberg(UC Berkeley): Robotics

Marc Pollefeys(Microsoft & ETH): Computer Vision

Markus Vincze(Technical University Wien (TUW)): Robotic Vision

Oliver Brock(TU Berlin): Robotic manipulation

Pascal Fua(INRIA): Computer Vision

Peter K. Allen.(Columbia University): Robotic Grasping, 3-D vision, Modeling, Medical robotics

Peter Corke(Queensland University of Technology): Robotic vision

Pieter Abbeel(UC Berkeley): Artificial Intelligence, Advanced Robotics

Raquel Urtasun(Uber ATG & University of Toronto): AI for self-driving cars, Computer Vision, Robotics

Robert Platt(Northeastern University): Robotic manipulation

Ruigang Yang(Baidu): Computer Vision, Robotics

Sergey Levine(UC Berkeley): Reinforcement Learning

Shuran Song(Columbia University), 3D Deep Learning, Robotics

Silvio Savarese(Stanford University): Computer Vision

Song-Chun Zhu(UCLA): Computer Vision

Tamim Asfour(Karlsruhe Institute of Technology (KIT)): Humanoid Robotics

Thomas Funkhouser(Princeton University): Geometry, Graphics, Shape

Valerio Ortenzi(University of Birmingham): Robotic vision

Vicient Lepetit(University of Bordeaux): Machine Learning, 3D Vision

Xiaogang Wang(Chinese University of Hong Kong): Deep Learning, Computer Vision

Xiaozhi Chen(DJI): Deep learning

Yan Xinchen(Uber ATG): Deep Representation Learning, Generative Modeling

Yu Xiang(Nvidia): Robotics, Computer Vision

Yue Wang(MIT): 3D Deep Learning