【Daily Paper 7.10】ECCV、IEEE、MICCAI等 3D 模型拟合、目标检测、姿态估计等多篇文章

本文链接：https://blog.csdn.net/No_max/article/details/107247717

关注 3D Daily，关注3D最新科研动态。

翻看公众号历史记录，查看每日 arXiv 论文更新。

3D方向6篇，医学影像2篇。

题目为机器翻译，仅供参考。

[1] The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

Phong曲面: 使用提升优化的高效3D模型拟合[ ECCV2020]
Microsoft Mixed Reality & AI Labs: Jingjing Shen, et al.
https://arxiv.org/pdf/2007.04940.pdf

Realtime perceptual and interaction capabilities in mixed reality require a range of 3D tracking problems to be solved at low latency on resource-constrained hardware such as head-mounted devices. Indeed, for devices such as HoloLens 2 where the CPU and GPU are left available for applications, multiple tracking subsystems are required to run on a continuous, real-time basis while sharing a single Digital Signal Processor. To solve model-fitting problems for HoloLens 2 hand tracking, where the computational budget is approximately 100 times smaller than an iPhone 7, we introduce a new surface model: the `Phong surface’. Using ideas from computer graphics, the Phong surface describes the same 3D shape as a triangulated mesh model, but with continuous surface normals which enable the use of lifting-based optimization, providing significant efficiency gains over ICP-based methods. We show that Phong surfaces retain the convergence benefits of smoother surface models, while triangle meshes do not.

[2] Cross-Modal Weighting Network for RGB-D Salient Object Detection

跨模态加权网络用于RGB-D显着目标检测[ECCV2020]
Shanghai University: Gongyang Li, et al.
https://arxiv.org/pdf/2007.04901.pdf
https://github.com/MathLee/CMWNet[Codes]

Depth maps contain geometric clues for assisting Salient Object Detection (SOD). In this paper, we propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD. Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion. These modules use Depth-to-RGB Weighing (DW) and RGB-to-RGB Weighting (RW) to allow rich cross-modal and cross-scale interactions among feature layers generated by different network blocks. To effectively train the proposed Cross-Modal Weighting Network (CMWNet), we design a composite loss function that summarizes the errors between intermediate predictions and ground truth over different scales. With all these novel components working together, CMWNet effectively fuses information from RGB and depth channels, and meanwhile explores object localization and details across scales. Thorough evaluations demonstrate CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
在这里插入图片描述

[3] JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

JGR-P2O：基于联合图推理的像素到偏移预测网络，用于从单个深度图像进行3D手姿估计[ECCV2020]
South China University of Technology: Linpu Fang, et al.
https://arxiv.org/pdf/2007.04646.pdf

State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model the complex dependencies among joints and augment the representation capability of each pixel. Then we densely estimate all pixels’ offsets to joints in both image plane and depth space and calculate the joints’ positions by a weighted average over all pixels’ predictions, totally discarding the complex postprocessing operations. The proposed model is implemented with an efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M parameters. Extensive experiments on multiple 3D hand pose estimation benchmarks demonstrate that the proposed method achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110fps on a single NVIDIA 1080Ti GPU.
在这里插入图片描述

[4] Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with Unknown Camera Intrinsics and Distortion Coefficients

基于单眼视觉的众包3D交通标志定位，具有未知的摄像机固有特性和失真系数[IEEE2020]
Hemang Chawla, et al.
https://arxiv.org/pdf/2007.04592.pdf

Autonomous vehicles and driver assistance systems utilize maps of 3D semantic landmarks for improved decision making. However, scaling the mapping process as well as regularly updating such maps come with a huge cost. Crowdsourced mapping of these landmarks such as traffic sign positions provides an appealing alternative. The state-of-the-art approaches to crowdsourced mapping use ground truth camera parameters, which may not always be known or may change over time. In this work, we demonstrate an approach to computing 3D traffic sign positions without knowing the camera focal lengths, principal point, and distortion coefficients a priori. We validate our proposed approach on a public dataset of traffic signs in KITTI. Using only a monocular color camera and GPS, we achieve an average single journey relative and absolute positioning accuracy of 0.26 m and 1.38 m, respectively.
在这里插入图片描述

[5] Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE

使用2D Slice VAE对3D脑MRI分布进行建模[MICCAI2020]
Computer Vision Lab: Anna Volokitin, et al.
https://arxiv.org/pdf/2007.04780.pdf
https://github.com/voanna/slices-to-3d-brain-vae/[Codes]

Probabilistic modelling has been an essential tool in medical image analysis, especially for analyzing brain Magnetic Resonance Images (MRI). Recent deep learning techniques for estimating high-dimensional distributions, in particular Variational Autoencoders (VAEs), opened up new avenues for probabilistic modeling. Modelling of volumetric data has remained a challenge, however, because constraints on available computation and training data make it difficult effectively leverage VAEs, which are well-developed for 2D images. We propose a method to model 3D MR brain volumes distribution by combining a 2D slice VAE with a Gaussian model that captures the relationships between slices. We do so by estimating the sample mean and covariance in the latent space of the 2D model over the slice direction. This combined model lets us sample new coherent stacks of latent variables to decode into slices of a volume. We also introduce a novel evaluation method for generated volumes that quantifies how well their segmentations match those of true brain anatomy. We demonstrate that our proposed model is competitive in generating high quality volumes at high resolutions according to both traditional metrics and our proposed evaluation.
在这里插入图片描述

[6] EPI-based Oriented Relation Networks for Light Field Depth Estimation

基于EPI的定向关系网络用于光场深度估计
Kunyuan Li, et al.
https://arxiv.org/pdf/2007.04538.pdf

Light fields record not only the spatial information of observed scenes but also the directions of all incoming light rays. The spatial information and angular information implicitly contain the geometrical characteristics such as multi-view geometry or epipolar geometry, which can be exploited to improve the performance of depth estimation. Epipolar Plane Image (EPI), the unique 2D spatial-angular slice of the light field, contains patterns of oriented lines. The slope of these lines is associated with the disparity. Benefit from this property of EPIs, some representative methods estimate depth maps by analyzing the disparity of each line in EPIs. However, these methods often extract the optimal slope of the lines from EPIs while ignoring the relationship between neighboring pixels, which leads to inaccurate depth map predictions. Based on the observation that the similar linear structure between the oriented lines and their neighboring pixels, we propose an end-to-end fully convolutional network (FCN) to estimate the depth value of the intersection point on the horizontal and vertical EPIs. Specifically, we present a new feature extraction module, called Oriented Relation Module (ORM), that constructs the relationship between the line orientations. To facilitate training, we also propose a refocusing-based data augmentation method to obtain different slopes from EPIs of the same scene point. Extensive experiments verify the efficacy of learning relations and show that our approach is competitive to other state-of-the-art methods.
在这里插入图片描述

[7] Point Set Voting for Partial Point Cloud Analysis

点集投票以进行部分点云分析
University of Michigan: Junming Zhang, et al.
https://arxiv.org/pdf/2007.04537.pdf

The continual improvement of 3D sensors has driven the development of algorithms to perform point cloud analysis. In fact, techniques for point cloud classification and segmentation have in recent years achieved incredible performance driven in part by leveraging large synthetic datasets. Unfortunately these same state-of-the-art approaches perform poorly when applied to incomplete point clouds. This limitation of existing algorithms is particularly concerning since point clouds generated by 3D sensors in the real world are usually incomplete due to perspective view or occlusion by other objects. This paper proposes a general model for partial point clouds analysis wherein the latent feature encoding a complete point clouds is inferred by applying a local point set voting strategy. In particular, each local point set constructs a vote that corresponds to a distribution in the latent space, and the optimal latent feature is the one with the highest probability. This approach ensures that any subsequent point cloud analysis is robust to partial observation while simultaneously guaranteeing that the proposed model is able to output multiple possible results. This paper illustrates that this proposed method achieves state-of-the-art performance on shape classification, part segmentation and point cloud completion.
在这里插入图片描述

[8] Automated Chest CT Image Segmentation of COVID-19 Lung Infection based on 3D U-Net

基于3D U-Net的COVID-19肺部感染的自动胸部CT图像分割
University of Augsburg: Dominik Muller, et al.
https://arxiv.org/ftp/arxiv/papers/2007/2007.04774.pdf
https://github.com/frankkramer-lab/covid19.MIScnn[Codes]

The coronavirus disease 2019 (COVID-19) affects billions of lives around the world and has a significant impact on public healthcare. Due to rising skepticism towards the sensitivity of RT-PCR as screening method, medical imaging like computed tomography offers great potential as alternative. For this reason, automated image segmentation is highly desired as clinical decision support for quantitative assessment and disease monitoring. However, publicly available COVID-19 imaging data is limited which leads to overfitting of traditional approaches. To address this problem, we propose an innovative automated segmentation pipeline for COVID-19 infected regions, which is able to handle small datasets by utilization as variant databases. Our method focuses on on-the-fly generation of unique and random image patches for training by performing several preprocessing methods and exploiting extensive data augmentation. For further reduction of the overfitting risk, we implemented a standard 3D U-Net architecture instead of new or computational complex neural network architectures. Through a 5-fold cross-validation on 20 CT scans of COVID-19 patients, we were able to develop a highly accurate as well as robust segmentation model for lungs and COVID-19 infected regions without overfitting on the limited data. Our method achieved Dice similarity coefficients of 0.956 for lungs and 0.761 for infection. We demonstrated that the proposed method outperforms related approaches, advances the state-of-the-art for COVID-19 segmentation and improves medical image analysis with limited data.
在这里插入图片描述