Metric3D:Towards Zero-shot Metric 3D Prediction from A Single Image

文章探讨了在深度学习中如何通过考虑焦距因素,实现scale-invariant的深度估计。Metric3D方法通过校准训练数据,针对不同设备和焦距的差异进行补偿,从而提高模型的泛化能力。实验结果展示了在KITTI和NYU数据集上的性能比较。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

参考代码:Metric3D

介绍

在如MiDasLeReS这些文章中对于来源不同的深度数据集使用归一化深度作为学习目标,则在网络学习的过程中就天然失去了对真实深度和物体尺寸的度量能力。而这篇文章比较明确地指出了影响深度估计尺度变化大的因素就是焦距 f f f,则对输入的图像或是GT做对应补偿之后就可以学习到具备scale表达能力的深度预测,这个跟车端视觉感知的泛化是一个道理。需要注意的是这里使用到的训练数据集需要预先知道相机的参数信息,且这里使用的相机模型为针孔模型。

在下图中首先比较了两种不同拍摄设备得到的图片在文章算法下测量物体的效果,可以说相差不大。
在这里插入图片描述

有了较为准确的深度估计结果之后,对应的单目slam、里程记这些都不是问题了。在配上大量的深度估计训练数据,那么泛化能力将会得到巨大提升,届时之前许多病态的问题都将得到解决。

方法设计

明确影响深度scale学习关键因子为焦距 f f f

对于针孔相机其内参主要参数为: f x δ x , f y δ y , u 0 , v 0 \frac{f_x}{\delta_x},\frac{f_y}{\delta_y},u_0,v_0 δxfx,δyfy,u0,v0,其中 f x , f y , δ x , δ y f_x,f_y,\delta_x,\delta_y fx,fy,δx,δy分别代表两个方向的焦距(一般情况下取两者相等)和像素大小,物理单位为微米。在相机中还有一个参数是成像传感器的尺寸,但是这个只影响成像的大小,就好比残画幅单反和全画幅单反的区别。

对于另外一个因素 δ \delta δ代表的是一个像素大小,在单孔成像原理中焦距、深度和成像大小的关系为(使用下图A图做相似三角形计算得到):
d a = S ^ [ f S ^ ′ ] = S ^ ⋅ α , α = [ f S ^ ′ ] d_a=\hat{S}[\frac{f}{\hat{S}^{'}}]=\hat{S}\cdot\alpha,\alpha=[\frac{f}{\hat{S}^{'}}] da=

### Few-Shot Learning Introduction Few-shot learning refers to a class of machine learning problems where the model is required to learn from very few examples, typically one or just a handful per category. This approach mimics human ability to generalize from limited data and has become an important area within deep learning research. The task layer's prior knowledge includes all methods that "learn how to learn," such as optimizing parameters for unseen tasks through meta-learning techniques which can provide good initialization for novel tasks[^1]. In this context: - **Meta-Learning**: Aims at designing models capable of fast adaptation with minimal training samples by leveraging previously acquired experience. - **Metric Learning**: Focuses on learning distance metrics between instances so similar ones are closer together while dissimilar remain apart in embedding space. #### Applications in Machine Learning One prominent application involves fine-grained classification using small datasets like Mini-ImageNet, demonstrating performance improvements when comparing different algorithms' embeddings propagation capabilities over time steps (Figure 7)[^2]. Another example comes from multi-label classification scenarios where combining MLP classifiers alongside KNN-based predictions enhances overall accuracy compared to traditional approaches relying solely upon prototype definitions derived directly from support sets during inference phases[^3]. Moreover, hybrid embedding strategies have been explored; these integrate both generalizable features learned across diverse domains along with specialized adjustments made specifically towards target-specific characteristics present only within given training distributions[Dtrain], thereby improving adaptability without sacrificing efficiency too much relative purely invariant alternatives[^4]. ```python def few_shot_classifier(embedding_model, classifier_type='mlp_knn'): """ Demonstrates a simple implementation outline for integrating Multi-layer Perceptron (MLP) and k-nearest neighbors (KNN). Args: embedding_model: Pre-trained neural network used to generate feature vectors. classifier_type: Type of final decision mechanism ('mlp', 'knn', or 'mlp_knn'). Returns: Combined prediction scores based on selected strategy. """ pass # Placeholder function body ``` --related questions-- 1. What specific challenges do few-shot learning systems face? 2. How does metric learning contribute to enhancing few-shot recognition abilities? 3. Can you explain more about the role of prototypes in few-shot classification schemes? 4. Are there any notable differences between MAML and other optimization-based meta-learning frameworks? 5. Which types of real-world problems benefit most significantly from applying few-shot learning methodologies?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值