3D物体识别相关

  前言:

     在物体识别或者是三维研究领域,我们知道RGB中包含了纹理、颜色和表观信息,而RGB信息对于光照等因素的鲁棒性不好,深度信息或者说是有关空间三维的信息来说,它能够对于光照等因素有较好的鲁棒性.因此三维物体相关研究也十分火热.它主要分为以下几个部分多视图图像、体积的、点云、多边形网格、本原来进行相关研究.

内容1:

此部分是根据另个人总结拿过来用的,总结的很好.链接地址[1]

 

3D Models

Princeton Shape Benchmark (2003) [Link]
1,814 models collected from the web in .OFF format. Used to evaluating shape-based retrieval and analysis algorithms.

Dataset for IKEA 3D models and aligned images (2013) [Link]
759 images and 219 models including Sketchup (skp) and Wavefront (obj) files, good for pose estimation.

Open Surfaces: A Richly Annotated Catalog of Surface Appearance (SIGGRAPH 2013) [Link]
OpenSurfaces is a large database of annotated surfaces created from real-world consumer photographs. Our annotation framework draws on crowdsourcing to segment surfaces from photos, and then annotate them with rich surface properties, including material, texture and contextual information.

PASCAL3D+ (2014) [Link]
12 categories, on average 3k+ objects per category, for 3D object detection and pose estimation.

SHREC 2014 - Large Scale Comprehensive 3D Shape Retrieval (2014) [Link][Paper]
8,987 models, categorized into 171 classes for shape retrieval.

ModelNet (2015) [Link]
127915 3D CAD models from 662 categories
ModelNet10: 4899 models from 10 categories
ModelNet40: 12311 models from 40 categories, all are uniformly orientated

ShapeNet (2015) [Link]
3Million+ models and 4K+ categories. A dataset that is large in scale, well organized and richly annotated.
ShapeNetCore [Link]: 51300 models for 55 categories.

A Large Dataset of Object Scans (2016) [Link]
10K scans in RGBD + reconstructed 3D models in .PLY format.

ObjectNet3D: A Large Scale Database for 3D Object Recognition (2016) [Link]
100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes.
Tasks: region proposal generation, 2D object detection, joint 2D detection and 3D object pose estimation, and image-based 3D shape retrieval

Thingi10K: A Dataset of 10,000 3D-Printing Models (2016) [Link]
10,000 models from featured “things” on thingiverse.com, suitable for testing 3D printing techniques such as structural analysis , shape optimization, or solid geometry operations.

3D Scenes

NYU Depth Dataset V2 (2012) [Link]
1449 densely labeled pairs of aligned RGB and depth images from Kinect video sequences for a variety of indoor scenes.

SUNRGB-D 3D Object Detection Challenge [Link]
19 object categories for predicting a 3D bounding box in real world dimension
Training set: 10,355 RGB-D scene images, Testing set: 2860 RGB-D images

SceneNN (2016) [Link]
100+ indoor scene meshes with per-vertex and per-pixel annotation.

ScanNet (2017) [Link]
An RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.

Matterport3D: Learning from RGB-D Data in Indoor Environments (2017) [Link]
10,800 panoramic views (in both RGB and depth) from 194,400 RGB-D images of 90 building-scale scenes of private rooms. Instance-level semantic segmentations are provided for region (living room, kitchen) and object (sofa, TV) categories.

SUNCG: A Large 3D Model Repository for Indoor Scenes (2017) [Link]
The dataset contains over 45K different scenes with manually created realistic room and furniture layouts. All of the scenes are semantically annotated at the object level.

MINOS: Multimodal Indoor Simulator (2017) [Link]
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. MINOS supports SUNCG and Matterport3D scenes.

Facebook House3D: A Rich and Realistic 3D Environment (2017) [Link]
House3D is a virtual 3D environment which consists of 45K indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views.

HoME: a Household Multimodal Environment (2017) [Link]
HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning.

AI2-THOR: Photorealistic Interactive Environments for AI Agents [Link]
AI2-THOR is a photo-realistic interactable framework for AI agents. There are a total 120 scenes in version 1.0 of the THOR environment covering four different room categories: kitchens, living rooms, bedrooms, and bathrooms. Each room has a number of actionable objects.

UnrealCV: Virtual Worlds for Computer Vision (2017) [Link][Paper]
An open source project to help computer vision researchers build virtual worlds using Unreal Engine 4.

Gibson Environment: Real-World Perception for Embodied Agents (2018 CVPR) [Link]
This platform provides RGB from 1000 point clouds, as well as multimodal sensor data: surface normal, depth, and for a fraction of the spaces, semantics object annotations. The environment is also RL ready with physics integrated. Using such datasets can further narrow down the discrepency between virtual environment and real world.

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset [Link]
System Overview: an end-to-end pipeline to render an RGB-D-inertial benchmark for large scale interior scene understanding and mapping. Our dataset contains 20M images created by pipeline: (A) We collect around 1 million CAD models provided by world-leading furniture manufacturers. These models have been used in the real-world production. (B) Based on those models, around 1,100 professional designers create around 22 million interior layouts. Most of such layouts have been used in real-world decorations. (C) For each layout, we generate a number of configurations to represent different random lightings and simulation of scene change over time in daily life. (D) We provide an interactive simulator (ViSim) to help for creating ground truth IMU, events, as well as monocular or stereo camera trajectories including hand-drawn, random walking and neural network based realistic trajectory. (E) All supported image sequences and ground truth.

3D Pose Estimation

Category-Specific Object Reconstruction from a Single Image (2014) [Paper]

Viewpoints and Keypoints (2015) [Paper]

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views (2015 ICCV) [Paper]

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization (2015) [Paper]

Modeling Uncertainty in Deep Learning for Camera Relocalization (2016) [Paper]

Robust camera pose estimation by viewpoint classification using deep learning (2016) [Paper]

Geometric loss functions for camera pose regression with deep learning (2017 CVPR) [Paper]

Generic 3D Representation via Pose Estimation and Matching (2017) [Paper]

3D Bounding Box Estimation Using Deep Learning and Geometry (2017) [Paper]

6-DoF Object Pose from Semantic Keypoints (2017) [Paper]

Relative Camera Pose Estimation Using Convolutional Neural Networks (2017) [Paper]

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions (2017) [Paper]

Single Image 3D Interpreter Network (2016) [Paper] [Code]

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction (2018 CVPR) [Paper]

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes (2018) [Paper]

Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images (2018 CVPR) [Paper]

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling (2018 CVPR) [Paper]

3D Pose Estimation and 3D Model Retrieval for Objects in the Wild (2018 CVPR) [Paper]

Single Object Classification

? 3D ShapeNets: A Deep Representation for Volumetric Shapes (2015) [Paper]

? VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition (2015) [Paper] [Code]

? Multi-view Convolutional Neural Networks for 3D Shape Recognition (2015) [Paper]

? DeepPano: Deep Panoramic Representation for 3-D Shape Recognition (2015) [Paper]

? ? FusionNet: 3D Object Classification Using Multiple Data Representations (2016) [Paper]

? ? Volumetric and Multi-View CNNs for Object Classification on 3D Data (2016) [Paper] [Code]

? Generative and Discriminative Voxel Modeling with Convolutional Neural Networks (2016) [Paper] [Code]

? Geometric deep learning on graphs and manifolds using mixture model CNNs (2016) [Link]

? 3D GAN: Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling (2016) [Paper] [Code]

? Generative and Discriminative Voxel Modeling with Convolutional Neural Networks (2017) [Paper]

? FPNN: Field Probing Neural Networks for 3D Data (2016) [Paper] [Code]

? OctNet: Learning Deep 3D Representations at High Resolutions (2017) [Paper] [Code]

? O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis (2017) [Paper] [Code]

? Orientation-boosted voxel nets for 3D object recognition (2017) [Paper] [Code]

? PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (2017) [Paper] [Code]

? PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (2017) [Paper] [Code]

? Feedback Networks (2017) [Paper] [Code]

? Escape from Cells: Deep Kd-Networks for The Recognition of 3D Point Cloud Models (2017) [Paper]

? Dynamic Graph CNN for Learning on Point Clouds (2018) [Paper]

? PointCNN (2018) [Paper]

? ? A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation (2018 CVPR) [Paper]

? ? PointGrid: A Deep Network for 3D Shape Understanding (CVPR 2018) [Paper] [Code]

Multiple Objects Detection

Sliding Shapes for 3D Object Detection in Depth Images (2014) [Paper]

Object Detection in 3D Scenes Using CNNs in Multi-view Images (2016) [Paper]

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images (2016) [Paper] [Code]

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding (2016) [Paper]

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite (2017) [Paper]

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection (2017) [Paper]

Frustum PointNets for 3D Object Detection from RGB-D Data (2017) [Paper]

Scene/Object Semantic Segmentation

Learning 3D Mesh Segmentation and Labeling (2010) [Paper]

Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering (2011) [Paper]

Single-View Reconstruction via Joint Analysis of Image and Shape Collections (2015) [Paper] [Code]

3D Shape Segmentation with Projective Convolutional Networks (2017) [Paper] [Code]

Learning Hierarchical Shape Segmentation and Labeling from Online Repositories (2017) [Paper]

? ScanNet (2017) [Paper] [Code]

? PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (2017) [Paper] [Code]

? PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (2017) [Paper] [Code]

? 3D Graph Neural Networks for RGBD Semantic Segmentation (2017) [Paper]

? 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds (2017) [Paper]

? ? Semantic Segmentation of Indoor Point Clouds using Convolutional Neural Networks (2017) [Paper]

? ? SEGCloud: Semantic Segmentation of 3D Point Clouds (2017) [Paper]

? ? Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55 (2017) [Paper]

? Dynamic Graph CNN for Learning on Point Clouds (2018) [Paper]

? PointCNN (2018) [Paper]

? ? 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation (2018) [Paper]

? ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans (2018) [Paper]

? ? SPLATNet: Sparse Lattice Networks for Point Cloud Processing (2018) [Paper]

? ? PointGrid: A Deep Network for 3D Shape Understanding (CVPR 2018) [Paper] [Code]

3D Model Synthesis/Reconstruction

Parametric Morphable Model-based methods

A Morphable Model For The Synthesis Of 3D Faces (1999) [Paper][Code]

The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans (2003) [Paper]

Category-Specific Object Reconstruction from a Single Image (2014) [Paper]

? DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image (2017) [Paper]

? Mesh-based Autoencoders for Localized Deformation Component Analysis (2017) [Paper]

? Exploring Generative 3D Shapes Using Autoencoder Networks (Autodesk 2017) [Paper]

? Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image (2017) [Paper]

? Compact Model Representation for 3D Reconstruction (2017) [Paper]

? Image2Mesh: A Learning Framework for Single Image 3D Reconstruction (2017) [Paper]

? Learning free-form deformations for 3D object reconstruction (2018) [Paper]

? Variational Autoencoders for Deforming 3D Mesh Models(2018 CVPR) [Paper]

? Lions and Tigers and Bears: Capturing Non-Rigid, 3D, Articulated Shape from Images (2018 CVPR) [Paper]

Part-based Template Learning methods

Modeling by Example (2004) [Paper]

Model Composition from Interchangeable Components (2007) [Paper]

Data-Driven Suggestions for Creativity Support in 3D Modeling (2010) [Paper]

Photo-Inspired Model-Driven 3D Object Modeling (2011) [Paper]

Probabilistic Reasoning for Assembly-Based 3D Modeling (2011) [Paper]

A Probabilistic Model for Component-Based Shape Synthesis (2012) [Paper]

Structure Recovery by Part Assembly (2012) [Paper]

Fit and Diverse: Set Evolution for Inspiring 3D Shape Galleries (2012) [Paper]

AttribIt: Content Creation with Semantic Attributes (2013) [Paper]

Learning Part-based Templates from Large Collections of 3D Shapes (2013) [Paper]

Topology-Varying 3D Shape Creation via Structural Blending (2014) [Paper]

Estimating Image Depth using Shape Collections (2014) [Paper]

Single-View Reconstruction via Joint Analysis of Image and Shape Collections (2015) [Paper]

Interchangeable Components for Hands-On Assembly Based Modeling (2016) [Paper]

Shape Completion from a Single RGBD Image (2016) [Paper]

Deep Learning Methods

? Learning to Generate Chairs, Tables and Cars with Convolutional Networks (2014) [Paper]

? Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis (2015, NIPS) [Paper]

? Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces (2015) [Paper]

? Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis (2015) [Paper] [Code]

? Multi-view 3D Models from Single Images with a Convolutional Network (2016) [Paper] [Code]

? View Synthesis by Appearance Flow (2016) [Paper] [Code]

? Voxlets: Structured Prediction of Unobserved Voxels From a Single Depth Image (2016) [Paper] [Code]

? 3D-R2N2: 3D Recurrent Reconstruction Neural Network (2016) [Paper] [Code]

? Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision (2016) [Paper]

? TL-Embedding Network: Learning a Predictable and Generative Vector Representation for Objects (2016) [Paper]

? 3D GAN: Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling (2016) [Paper]

? 3D Shape Induction from 2D Views of Multiple Objects (2016) [Paper]

? Unsupervised Learning of 3D Structure from Images (2016) [Paper]

? Generative and Discriminative Voxel Modeling with Convolutional Neural Networks (2016) [Paper] [Code]

? Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency (2017) [Paper]

? Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks (2017) [Paper] [Code]

? Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis (2017) [Paper] [Code]

? Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs (2017) [Paper] [Code]

? Hierarchical Surface Prediction for 3D Object Reconstruction (2017) [Paper]

? OctNetFusion: Learning Depth Fusion from Data (2017) [Paper] [Code]

? A Point Set Generation Network for 3D Object Reconstruction from a Single Image (2017) [Paper] [Code]

? Learning Representations and Generative Models for 3D Point Clouds (2017) [Paper] [Code]

? Shape Generation using Spatially Partitioned Point Clouds (2017) [Paper]

? PCPNET Learning Local Shape Properties from Raw Point Clouds (2017) [Paper]

? Transformation-Grounded Image Generation Network for Novel 3D View Synthesis (2017) [Paper] [Code]

? Tag Disentangled Generative Adversarial Networks for Object Image Re-rendering (2017) [Paper]

? 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks (2017) [Paper] [Code]

? Interactive 3D Modeling with a Generative Adversarial Network (2017) [Paper]

? ? Weakly supervised 3D Reconstruction with Adversarial Constraint (2017) [Paper] [Code]

? SurfNet: Generating 3D shape surfaces using deep residual networks (2017) [Paper]

? GRASS: Generative Recursive Autoencoders for Shape Structures (SIGGRAPH 2017) [Paper] [Code] [code]

? 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks (2017) [Paper][code]

? Neural 3D Mesh Renderer (2017) [Paper] [Code]

? ? Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55 (2017) [Paper]

? Pix2vox: Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks (2017) [Code]

? ? What You Sketch Is What You Get: 3D Sketching using Multi-View Deep Volumetric Prediction (2017) [Paper]

? ? MarrNet: 3D Shape Reconstruction via 2.5D Sketches (2017) [Paper]

? ? ? Learning a Multi-View Stereo Machine (2017 NIPS) [Paper]

? 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions (2017) [Paper]

? Scaling CNNs for High Resolution Volumetric Reconstruction from a Single Image (2017) [Paper]

? PU-Net: Point Cloud Upsampling Network (2018) [Paper] [Code]

? ? Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction (2018 CVPR) [Paper]

? ? Object-Centric Photometric Bundle Adjustment with Deep Shape Prior (2018) [Paper]

? ? Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction (2018 AAAI) [Paper]

? Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images (2018) [Paper]

? AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation (2018 CVPR) [Paper] [Code]

? ? Deep Marching Cubes: Learning Explicit Surface Representations (2018 CVPR) [Paper]

? Im2Avatar: Colorful 3D Reconstruction from a Single Image (2018) [Paper]

? Learning Category-Specific Mesh Reconstruction from Image Collections (2018) [Paper]

? CSGNet: Neural Shape Parser for Constructive Solid Geometry (2018) [Paper]

? Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings (2018) [Paper]

? ? ? Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation (2018) [Paper] [Code]

? ? ? Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction (2018 CVPR) [Paper]

? ? Neural scene representation and rendering (2018) [Paper]

? Im2Struct: Recovering 3D Shape Structure from a Single RGB Image (2018 CVPR) [Paper]

? FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation (2018 CVPR) [Paper]

? ? Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling (2018 CVPR) [Paper]

? 3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare (2018 CVPR) [Paper]

? Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers (2018 CVPR) [Paper]

? Global-to-Local Generative Model for 3D Shapes (SIGGRAPH Asia 2018) [Paper]

? ? ? ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning (TOG 2018) [Paper] [Code]

? ? PointGrid: A Deep Network for 3D Shape Understanding (CVPR 2018) [Paper] [Code]

? GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction (2018) [Paper]

Texture/Material Synthesis

Texture Synthesis Using Convolutional Neural Networks (2015) [Paper]

Two-Shot SVBRDF Capture for Stationary Materials (SIGGRAPH 2015) [Paper]

Reflectance Modeling by Neural Texture Synthesis (2016) [Paper]

Modeling Surface Appearance from a Single Photograph using Self-augmented Convolutional Neural Networks (2017) [Paper]

High-Resolution Multi-Scale Neural Texture Synthesis (2017) [Paper]

Reflectance and Natural Illumination from Single Material Specular Objects Using Deep Learning (2017) [Paper]

Joint Material and Illumination Estimation from Photo Sets in the Wild (2017) [Paper]

JWhat Is Around The Camera? (2017) [Paper]

TextureGAN: Controlling Deep Image Synthesis with Texture Patches (2018 CVPR) [Paper]

Gaussian Material Synthesis (2018 SIGGRAPH) [Paper]

Non-stationary Texture Synthesis by Adversarial Expansion (2018 SIGGRAPH) [Paper]

Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients (2018 CVPR) [Paper]

LIME: Live Intrinsic Material Estimation (2018 CVPR) [Paper]

Single-Image SVBRDF Capture with a Rendering-Aware Deep Network (2018) [Paper]

PhotoShape: Photorealistic Materials for Large-Scale Shape Collections (2018) [Paper]

Style Transfer

Style-Content Separation by Anisotropic Part Scales (2010) [Paper]

Design Preserving Garment Transfer (2012) [Paper]

Analogy-Driven 3D Style Transfer (2014) [Paper]

Elements of Style: Learning Perceptual Shape Style Similarity (2015) [Paper] [Code]

Functionality Preserving Shape Style Transfer (2016) [Paper] [Code]

Unsupervised Texture Transfer from Images to Model Collections (2016) [Paper]

Learning Detail Transfer based on Geometric Features (2017) [Paper]

Neural 3D Mesh Renderer (2017) [Paper] [Code]

Appearance Modeling via Proxy-to-Image Alignment (2018) [Paper]

? Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images (2018) [Paper]

Scene Synthesis/Reconstruction

Make It Home: Automatic Optimization of Furniture Arrangement (2011, SIGGRAPH) [Paper]

Interactive Furniture Layout Using Interior Design Guidelines (2011) [Paper]

Synthesizing Open Worlds with Constraints using Locally Annealed Reversible Jump MCMC (2012) [Paper]

Example-based Synthesis of 3D Object Arrangements (2012 SIGGRAPH Asia) [Paper]

Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models (2013) [Paper]

Action-Driven 3D Indoor Scene Evolution (2016) [Paper]

Relationship Templates for Creating Scene Variations (2016) [Paper]

IM2CAD (2017) [Paper]

Predicting Complete 3D Models of Indoor Scenes (2017) [Paper]

Complete 3D Scene Parsing from Single RGBD Image (2017) [Paper]

Raster-to-Vector: Revisiting Floorplan Transformation (2017, ICCV) [Paper] [Code]

Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes (2017) [Blog]

Adaptive Synthesis of Indoor Scenes via Activity-Associated Object Relation Graphs (2017 SIGGRAPH Asia) [Paper]

Automated Interior Design Using a Genetic Algorithm (2017) [Paper]

SceneSuggest: Context-driven 3D Scene Design (2017) [Paper]

Human-centric Indoor Scene Synthesis Using Stochastic Grammar (2018, CVPR)[Paper] [Supplementary] [Code]

? ? FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans (2018) [Paper] [Code]

? ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans (2018) [Paper]

Deep Convolutional Priors for Indoor Scene Synthesis (2018) [Paper]

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars (2018) [Paper]

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image (ECCV 2018) [Paper]

Language-Driven Synthesis of 3D Scenes from Scene Databases (SIGGRAPH Asia 2018) [Paper]

Scene Understanding

Characterizing Structural Relationships in Scenes Using Graph Kernels (2011 SIGGRAPH) [Paper]

Understanding Indoor Scenes Using 3D Geometric Phrases (2013) [Paper]

Organizing Heterogeneous Scene Collections through Contextual Focal Points (2014 SIGGRAPH) [Paper]

SceneGrok: Inferring Action Maps in 3D Environments (2014, SIGGRAPH) [Paper]

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding (2014) [Paper]

Learning Informative Edge Maps for Indoor Scene Layout Prediction (2015) [Paper]

Rent3D: Floor-Plan Priors for Monocular Layout Estimation (2015) [Paper]

A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method (2016) [Paper]

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes (2016) [Paper]

3D Semantic Parsing of Large-Scale Indoor Spaces (2016) [Paper] [Code]

Single Image 3D Interpreter Network (2016) [Paper] [Code]

Deep Multi-Modal Image Correspondence Learning (2016) [Paper]

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks (2017) [Paper] [Code] [Code] [Code] [Code]

RoomNet: End-to-End Room Layout Estimation (2017) [Paper]

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite (2017) [Paper]

Semantic Scene Completion from a Single Depth Image (2017) [Paper] [Code]

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene (2018 CVPR) [Paper] [Code]

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image (2018 CVPR) [Paper] [Code]

PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image (2018 CVPR) [Paper] [Code]

Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery (2018 CVPR) [Paper]

 

Pano2CAD: Room Layout From A Single Panorama Image (2018 CVPR) [Paper]

内容2:

此部分对其中有些总结.

FusionNet融合三个卷积网络:从二维升级到三级

处理物体的方式使用到了物体的体表示(volumetric representation)。在体表示中,3D 物体被离散化为 30 x 30 x 30 的体素网格。如果该物体的某一个部分被表示在了一个 1 x 1 x 1 的体素中,那么该体素就取值 1,否则就取 0。和之前的成果不一样,文中同时使用了像素表示和体素表示来学习物体的特征和对 3D CAD 物体进行分类,而且比单独使用这两种表示中的任何一种的效果都好。这里是一些例子

两种表示方法。左边:浴缸、凳子、马桶和衣柜 3D CAD 物体的 2D 投影;右边:体素化的浴缸、凳子、马桶和衣柜

   构建了两个卷积神经网络(V-CNN I 和 V-CNN II)来处理体素数据,还有另一个(MV-CNN)用来处理像素数据。下图表示了这些网络结合在一起然后给出物体标签的最终决策的方式。这和处理 2D 图像的、只能从图像中学习空间局部特征的标准 CNN 不同

FusionNet 是 V-CNN I、V-CNN II 和 MV-CNN(其基于 AlexNet,并在 ImageNet 上进行过预训练) 三个不同网络的融合。这三个网络在评分层(scores layers)融合——评分层在找到类别预测前给出分数的线性组合。前两个网络使用了体素化的 CAD 模型,后一个网络使用了 2D 投影。[2]

具体的可以来看论文,论文的地址:https://arxiv.org/abs/1607.05695

《Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning》
Ensemble Methods[4]
1)Unweighted Averaging:
CNNs标准的集成方法是未加权的平均。每个模型的softmax概率进行平均,来得到最后的预测结果。
2)Weighted Averaging:
加权平均的方法和未加权的方法类似,不同点在于每个独立的模型拥有自己的权重。

权值的确定:
网格搜索(grid search)寻找所有可能的值,所有独立的候选模型在验证集上表现来确定最后的权重大小,权值的和(sum)为1。(距离加权)

3)Majority Voting:
多数表决能够被使用当集成的模型>2时。每个独立的模型给出预测的label,得票数最多的为最后的预测结果,majority voting 更依赖于所有模型的top-1准确率
 

 

数据集:

3d点云数据集---[3]

这篇论文讲述了RGBD数据集内容:RGBD Datasets: Past, Present and Future    https://arxiv.org/pdf/1604.00999.pdf

Reference:

[1].https://go.ctolib.com/timzhang642-3D-Machine-Learning.html#articleHeader11

[2].http://www.sohu.com/a/108932418_381309    或者     http://www.oreilly.com.cn/ideas/?p=555

[3].https://blog.csdn.net/u014636245/article/details/83269939

[4].https://blog.csdn.net/u013841196/article/details/82940385

  • 6
    点赞
  • 53
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值