CVPR 2025 所有名字里带3D的论文:1-374 (存档,后续整理)

1. MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning

MoST:用于3D表示学习的高效Monarch稀疏微调方法
(Monarch指" Monarch算法",一种稀疏优化技术)

2. METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

METASCENES:面向真实世界3D扫描的自动化副本创建研究
(Replica Creation指"数字孪生创建")

3. SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

SeqAfford:基于多模态大语言模型的序列式3D功能推理
(Affordance指"功能可见性",如物体可被抓取的属性)

4. 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

3D-Mem:用于具身探索与推理的3D场景记忆模型
(Embodied指"具身智能",如机器人实体交互)

5. Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes

Hearing Hands:从3D场景物理交互中生成声音
(结合触觉-听觉跨模态生成)

6. Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Eval3D:可解释且细粒度的3D生成评估方法
(针对生成模型的定量+定性评估框架)

7. SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction

SGCR:基于球面高斯的高效3D曲线重建方法
(几何重建领域,用参数化高斯表示曲线)

8. InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

InterAct:推进大规模通用3D人机交互生成
(生成多样化人体-物体交互场景的3D模型)

9. Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Dora:3D形状变分自编码器的采样与基准测试
(VAE在3D形状生成中的采样效率优化)

10. RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

RoomTour3D:面向具身导航的几何感知视频指令调优
(结合视频指令与几何理解的导航模型)

11. Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation

半监督3D血管分割的动态协作网络学习
(医学影像领域,动态融合监督与非监督信号)

12. Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Scene Splatter:基于视频扩散模型的单图像动量3D场景生成
(利用视频扩散的时序信息增强单图3D生成)

13. GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

GaussianFormer-2:概率高斯叠加的高效3D占据预测
(自动驾驶场景,预测环境中可通行区域)

14. CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

CompGS:通过动态优化3D高斯释放2D组合性的文本到3D合成方法
(利用2D图像组合性提升文本生成3D模型的可控性)

15. Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset

数字孪生目录:大规模照片级真实感3D物体数字孪生数据集
(包含高精度物体模型及其孪生数据)

16. SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

SP3D:通过精确跨模态语义提示增强稀疏监督3D目标检测
(利用文本/图像提示弥补3D标注数据不足)

17. DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

DashGaussian:200秒内优化3D高斯 splatting 的方法
(Gaussian Splatting是近期流行的3D重建技术,Dash强调速度)

18. Steepest Descent Density Control for Compact 3D Gaussian Splatting

用于紧凑3D高斯 splatting 的最速下降密度控制
(通过密度优化减少高斯数量,提升渲染效率)

19. IDOL: Instant Photorealistic 3D Human Creation from a Single Image

IDOL:单图像实时照片级真实感3D人体生成
(实时性+高保真度的人体重建技术)

20. MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks

MaskGaussian:基于概率掩码的自适应3D高斯表示
(利用分割掩码动态调整高斯分布)

21. Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation

Wav2Sem:即插即用的音频语义解耦驱动3D语音面部动画
(分离语音语义与情感,驱动更自然的面部动画)

22. LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

LITA-GS:基于无参考3D高斯 splatting 和物理先验的光照无关新视图合成
(不依赖参考图像的跨光照视图生成)

23. GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

GuardSplat:3D高斯 splatting 的高效鲁棒水印方案
(针对3D模型的版权保护技术)

24. Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection

统一稀疏监督3D目标检测的类别原型学习
(用原型表示类别特征,适应少样本场景)

25. Cross-Modal 3D Representation with Multi-View Images and Point Clouds

多视图图像与点云的跨模态3D表示
(融合2D图像与3D点云的特征学习)

26. DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image

DeformCL:3D医学图像血管提取的可变形中心线表示学习
(医学影像中血管中心线的形变建模)

27. PrEditor3D: Fast and Precise 3D Shape Editing

PrEditor3D:快速精准的3D形状编辑工具
(交互式3D模型编辑算法)

28. Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence

Stable-SCore:基于稳定配准的3D形状对应框架
(解决不同形状间的关键点匹配问题)

29. HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

HiPART:遮挡场景下3D人体姿态估计的分层姿态自回归Transformer
(分层建模解决遮挡带来的姿态歧义)

30. MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

MIDI:单图像到3D场景生成的多实例扩散模型
(扩散模型在复杂场景生成中的多物体建模)

31. World-consistent Video Diffusion with Explicit 3D Modeling

显式3D建模的世界一致性视频扩散
(确保视频帧间3D空间一致性的扩散生成)

32. GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

GaussianWorld:流式3D占据预测的高斯世界模型
(实时处理连续传感器数据的占据预测)

33. Generative Gaussian Splatting for Unbounded 3D City Generation

生成式高斯 splatting 的无界3D城市生成
(无限扩展的城市级场景生成技术)

34. WildAvatar: Learning In-the-wild 3D Avatars from the Web

WildAvatar:从网络数据中学习真实场景3D化身
(非结构化网络图像的3D人像重建)

35. SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

SOLAMI:3D自主角色沉浸式交互的社会视觉-语言-动作建模
(多模态模型驱动智能体的社交交互)

36. MAGE : Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model

MAGE:基于多视图G-Buffer估计的单图像材质感知3D生成
(同时重建几何与材质的单图3D技术)

37. Volumetrically Consistent 3D Gaussian Rasterization

体积一致的3D高斯光栅化方法
(渲染时保证体素空间一致性的光栅化算法)

38. FSHNet: Fully Sparse Hybrid Network for 3D Object Detection

FSHNet:全稀疏混合网络的3D目标检测
(结合稀疏卷积与其他网络的高效检测模型)

39. GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

GaussTR:基础模型对齐的高斯Transformer自监督3D空间理解
(预训练模型与高斯表示结合的自监督学习)

40. BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting

BIGS:单目视频无类别双手交互重建的3D高斯 splatting 方法
(不依赖物体类别标签的双手操作重建)

41. Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency

几何引导的多视图时序一致在线3D视频合成
(实时生成具有时空一致性的3D视频)

42. Dragin3D: Image Editing by Dragging in 3D Space

Dragin3D:3D空间拖拽式图像编辑
(通过3D交互实现2D图像的语义编辑)

43. GenAssets: Generating in-the-wild 3D Assets in Latent Space

GenAssets:隐空间生成真实场景3D资产
(在潜在空间直接生成可用的3D模型资源)

44. ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network

ViKIENet:虚拟关键实例增强的高效3D目标检测网络
(通过虚拟样本增强解决小目标检测难题)

45. BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis

BFANet:边界特征分析的3D语义分割再思考
(强调边界特征在分割中的重要性)

46. MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation

MuTri:OCT到OCTA三维图像转换的多视图三重对齐
(医学影像中不同模态图像的配准与转换)

47. Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting

基于四面体约束高斯 splatting 的可编辑照片级真实感3D化身创建
(通过几何约束提升模型编辑性与真实感)

48. LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

LeanGaussian:打破像素-点云对应关系的3D高斯建模
(无对应关系的跨模态高斯表示学习)

49. CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching

CH3Depth:基于流匹配的高效灵活深度基础模型
(通用型深度估计模型,适用于多场景)

50. SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

SPAR3D:单图像稳定点感知的3D物体重建
(利用关键点感知提升重建结果的稳定性)

51. UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation

UrbanCAD:面向城市场景仿真的高可控性照片级真实感3D车辆生成
(CAD式参数化建模+高保真渲染)

52. Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

基于统一梯度下降视角的3D语义占据预测时序融合再思考
(从优化理论角度重构时序信息融合机制)

53. LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

LeviTor:3D轨迹导向的图像到视频合成
(根据预设3D轨迹生成带运动信息的视频)

54. 3D Dental Model Segmentation with Geometrical Boundary Preserving

几何边界保持的3D牙科模型分割
(医学影像领域,保留牙体结构边界完整性)

55. IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments

IAAO:3D环境中铰接物体的交互式功能学习
(用户交互引导下学习可活动物体的使用方式)

56. Structured 3D Latents for Scalable and Versatile 3D Generation

可扩展通用3D生成的结构化3D隐变量模型
(设计分层隐空间以支持大规模场景生成)

57. GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

GaussianIP:基于以人为中心扩散先验的身份保持真实感3D人体生成
(保留人物身份特征的高保真3D重建)

58. MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

MV-SSM:多视图状态空间建模的3D人体姿态估计
(状态空间模型处理跨视图的姿态不确定性)

59. FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

FlashGS:面向大规模高分辨率渲染的高效3D高斯 splatting
(优化渲染效率以支持城市级场景实时渲染)

60. ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary

ArtiScene:通过图像中介的语言驱动艺术性3D场景生成
(文本→图像→3D场景的两阶段生成流程)

61. PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

PO3AD:点偏移预测的改进型3D点云异常检测
(通过点偏移建模识别异常点云区域)

62. Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

Gen3DEval:基于vLLMs的生成3D物体自动评估
(大语言模型视觉变体用于3D生成结果评分)

63. Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Fancy123:单图像高质量3D网格生成的即插即用变形方法
(结合预训练变形模型提升重建网格质量)

64. MagicArticulate: Make Your 3D Models Articulation-Ready

MagicArticulate:让你的3D模型具备可铰接性
(自动为静态模型添加关节结构的算法)

65. ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion

ShapeShifter:多尺度稀疏点体素扩散的3D形状变体生成
(结合点云与体素表示的扩散模型)

66. VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

VidBot:从真实2D人类视频中学习可泛化3D动作的零样本机器人操作
(跨模态迁移学习解决机器人训练数据不足)

67. DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

DI-PCG:基于扩散的高效逆向程序内容生成高质量3D资产
(根据用户示例反向推导程序生成参数)

68. Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception

基于跨帧时空感知的预训练3D模型适配点云视频理解
(将静态点云模型扩展至时序视频场景)

69. Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

Lift3D Policy:提升2D基础模型至鲁棒3D机器人操作
(将图像训练的模型迁移至机器人实体操作)

70. Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video

Link to the Past:单目视频快速3D人体重建的时序传播方法
(利用视频前后帧信息加速重建过程)

71. iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting

iG-6DoF:基于迭代3D高斯 splatting 的未知物体无模型6自由度位姿估计
(无需物体先验模型的实时位姿跟踪)

72. FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

FluidNexus:单视频3D流体重建与预测
(从单目视频中恢复流体的三维运动场)

73. UniK3D: Universal Camera Monocular 3D Estimation

UniK3D:通用相机单目三维估计
(不依赖相机内参的单图像3D重建)

74. Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

掩码点实体对比的开放词汇3D场景理解
(自监督学习中通过掩码增强实体级特征区分)

75. Revisiting MAE Pre-training for 3D Medical Image Segmentation

医学图像3D分割的MAE预训练再探讨
(将掩码自编码器应用于3D医学影像领域)

76. Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

稀疏输入下场景接地引导的视频扩散先验约束3D高斯 splatting
(利用视频扩散先验弥补稀疏输入的重建缺陷)

77. FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

FirePlace:大语言模型常识推理的几何细化用于3D物体放置
(文本指令驱动的场景布局几何合理性优化)

78. MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing

MICAS:3D点云处理的多粒度上下文自适应采样
(根据局部特征复杂度动态调整采样密度)

79. Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos

分层运动融合:第一人称视频中运动分割提升至3D
(从2D运动分割结果恢复三维运动结构)

80. RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects

RASP:基于3D变形艺术的阴影引导不规则物体 packing
(利用视觉变形原理优化物体空间排列)

81. Reference-Based 3D-Aware Image Editing with Triplanes

基于参考图像的三平面3D感知图像编辑
(Triplanes即3D场景表示的三平面法,支持视角一致的编辑)

82. InsTaG: Learning Personalized 3D Talking Head from Few-Second Video

InsTaG:数秒视频学习个性化3D说话人头模型
(少量数据快速生成带语音驱动的个性化头像)

83. GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

GaussianSpa:"优化-稀疏化"的紧凑高质量3D高斯 splatting 简化框架
(平衡模型复杂度与渲染质量的高斯精简算法)

84. Let’s Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Let’s Chorus:协作感知的混合歌曲驱动3D头部动画
(多人物交互场景中基于音乐的协同头部动作生成)

85. MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

MAR-3D:高分辨率3D生成的渐进式掩码自回归模型
(分层掩码机制处理大规模3D数据生成)

86. EAP-GS: Efficient Augmentation of Pointcloud for 3D Gaussian Splatting in Few-shot Scene Reconstruction

EAP-GS:少样本场景重建中3D高斯 splatting 的点云高效增强
(数据增强技术提升小数据集下的重建性能)

87. VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

VideoScene:蒸馏视频扩散模型的一步式3D场景生成
(轻量化模型实现从视频到3D场景的直接转换)

88. DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery

DroneSplat:真实无人机影像鲁棒3D重建的高斯 splatting 方法
(处理无人机航拍图像的光照/视角变化问题)

89. Towards Realistic Example-based Modeling via 3D Gaussian Stitching

基于3D高斯拼接的真实感样例建模研究
(组合多个高斯模型实现复杂场景重建)

90. A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions

基于交互的3D点云自监督预训练方法统一解释框架
(从交互视角分析不同自监督方法的共性)

91. Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes

视觉引导动作:3D场景中注视感知功能增强的人体运动预测
(结合视线方向预测人物与物体的交互动作)

92. SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

SF3D:带UV展开与光照解耦的稳定快速3D网格重建
(同时解决几何重建与纹理映射问题)

93. ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

ARKit LabelMaker:室内3D场景理解的新尺度标注工具
(利用ARKit技术生成大规模场景标注数据)

94. Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

复杂场景下单目3D车道检测的车道与点云再思考
(融合车道语义与点云几何的检测方法)

95. Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking

Mono3DVLT:基于单目视频的3D视觉语言跟踪
(文本查询驱动的单目视频目标3D跟踪)

96. Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation

分层高斯混合模型 splatting 的高效部件可控3D生成
(分层次控制模型各部分的生成过程)

97. Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation

对称性回归:从单图像对称检测到3D生成
(利用物体对称性先验提升单图3D重建精度)

98. ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning

ReasonGrounder:LVLM引导的分层特征 splatting 开放词汇3D视觉定位与推理
(大语言模型视觉变体驱动的语义定位)

99. EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering

EventSplat:移动事件相机实时渲染的3D高斯 splatting
(基于事件相机的低延迟3D重建技术)

100. Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

Stereo4D:从网络立体视频中学习3D物体运动规律
(利用双目视频数据训练时空运动模型)

101. HyperGS: Hyperspectral 3D Gaussian Splatting

HyperGS:高光谱3D高斯splatting
(结合高光谱成像的多波段信息进行3D重建)

102. ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

ArtFormer:可控生成多样化3D铰接物体
(支持参数化控制的可活动物体生成,如关节结构)

103. 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

3DTopia-XL:基于基元扩散的高质量3D资产规模化生成
(Primitive指基础几何形状,如球体/立方体,通过扩散模型组合生成复杂模型)

104. 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

3D-LLaVA:基于全超点Transformer的通用3D语言-视觉模型
(LMM=Language-Modeling-Model,超点指3D点云的聚类单元)

105. DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

DualTalk:双 speaker 交互的3D说话人头对话模型
(支持两人实时对话的面部动画生成)

106. Geometry in Style: 3D Stylization via Surface Normal Deformation

Geometry in Style:基于表面法向量变形的3D风格化
(通过修改法向量实现几何形状的艺术化风格转换)

107. EnliveningGS: Active Locomotion of 3DGS

EnliveningGS:3D高斯splatting的主动运动建模
(为静态3D重建模型添加动态运动能力)

108. ChatHuman: Chatting about 3D Humans with Tools

ChatHuman:使用工具对话式交互3D人体模型
(结合大语言模型实现文本控制的人体编辑)

109. You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

You See it, You Got it:无姿态标注视频的规模化3D内容生成学习
(从无标注视频中自动提取3D结构信息)

110. GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding

GA3CE:视线感知3D上下文编码的无约束3D视线估计
(适用于任意视角的视线方向预测)

111. PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction Poster Session 4

PyTorchGeoNodes:支持可微形状程序的3D形状重建(海报分会场4)
(可微编程框架用于几何重建优化,Sinisa Stek为作者名)

112. De²Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation

De²Gaze:可变形解耦表示学习的3D视线估计
(De²=Deformable+Decoupled,分离头部姿态与视线方向)

113. Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

Real-IAD D³:工业异常检测的真实世界2D/伪3D/3D数据集
(D³=2D+Pseudo-3D+3D,多模态工业检测数据)

114. InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

InstanceGaussian:外观-语义联合高斯表示的3D实例级感知
(同时建模物体外观特征与语义类别的实例分割方法)

115. UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

UniMamba:基于分组高效Mamba的激光雷达3D目标检测统一时空通道表示学习
(Mamba为新型序列模型架构,Group-Efficient指分组优化计算效率)

116. TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

TexGaussian:基于八叉树3D高斯splatting的高质量PBR材质生成
(PBR=物理基础渲染,八叉树用于材质细节分层表示)

117. V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

V2X-R:去噪扩散协同的激光雷达-4D雷达融合3D目标检测
(V2X=车联网,4D雷达指含速度信息的三维雷达)

118. Matrix3D: Large Photogrammetry Model All-in-One

Matrix3D:大型摄影测量模型一体化方案
(一站式处理大规模影像的3D重建技术)

119. RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos

RigGS:视频铰接物体建模的3D高斯骨骼绑定
(Rigging指骨骼绑定,为高斯模型添加关节运动结构)

120. Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Make-It-Animatable:可动画3D角色创作的高效框架
(自动为静态模型添加动画骨骼的工具链)

121. 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations

基于紧凑张量表示的表情动态3D高斯头部化身
(张量压缩技术实现面部表情的高保真动态渲染)

122. ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

ProxyTransformation:代理注意力预塑形点云流形的3D视觉定位
(Proxy指代理点,通过注意力机制对齐文本与点云语义)

123. ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

ParaHome:日常家庭活动参数化的人机交互3D生成建模
(提取家居场景中动作-物体交互的参数化模型)

124. Sketchy Bounding-box Supervision for 3D Instance Segmentation

草图边界框监督的3D实例分割
(利用粗略手绘框作为弱监督信号训练分割模型)

125. Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation

3D手势估计中合成-真实域差距分析
(研究虚拟训练数据与真实场景的迁移性能差异)

126. SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

SPARS3R:稀疏3D重建的语义先验对齐与正则化
(引入语义先验(如物体形状)提升稀疏视角重建精度)

127. Wonderland: Navigating 3D Scenes from a Single Image

Wonderland:单图像3D场景导航
(从单幅图像生成可交互的3D场景漫游体验)

128. PERSE: Personalized 3D Generative Avatars from A Single Portrait

PERSE:单张人像生成个性化3D生成化身
(基于扩散模型的单图人像3D建模)

129. vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation

vesselFM:通用3D血管分割基础模型
(支持多模态医学影像的血管分割预训练模型)

130. Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

视觉语言模型的泛化少样本3D点云分割
(通过文本描述支持新类别少样本分割任务)

131. 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

3DGUT:高斯splatting中支持畸变相机与二次光线的方法
(处理镜头畸变和间接光照的渲染增强技术)

132. Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

VLM中全局-局部树搜索的3D室内场景生成
(VLMs=视觉语言模型,树结构用于分层场景布局规划)

133. SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

SAR3D:多尺度3D矢量量化变分自编码器的自回归3D物体生成与理解
(VQVAE=Vector Quantized VAE,用于离散化3D特征表示)

134. 3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation

3D-AVS:基于激光雷达的3D自动词汇分割
(无预定义类别的自监督语义分割,词汇指聚类生成的语义单元)

135. Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding

3D视觉定位的高斯splatting语义编程链
(通过语义指令链控制高斯模型的空间定位)

136. SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation

SeaLion:语义部件感知隐点扩散模型的3D生成
(分部件控制的扩散模型,如生成带手臂/腿部的人体)

137. MVBoost: Boost 3D Reconstruction with Multi-View Refinement

MVBoost:多视图精修增强的3D重建
(多视角交叉验证提升重建模型的几何精度)

138. MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

MultiGO:单目3D纹理人体重建的多层次几何学习
(分层建模(骨骼/肌肉/皮肤)实现高保真重建)

139. FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting

FlexGS:一次训练多场景部署的灵活一体化3D高斯splatting
(统一模型适配不同传感器和重建场景)

140. Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

个性化动态纹理的高保真3D说话人头研究
(结合面部动态纹理的个性化头像生成)

141. Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Scene4U:单全景图像分层3D场景重建的沉浸式探索
(全景图分解为多层几何结构的交互式场景生成)

142. GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction

GaPT-DAR:2D变形与3D重建融合的类别级服装姿态跟踪
(同一类别服装的跨实例姿态迁移跟踪)

143. Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

文本引导的稀疏体素剪枝高效3D视觉定位
(根据文本查询剔除无关体素,加速语义定位)

144. Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories

扰动-修正:生成轨迹驱动的灵活3D编辑
(通过轨迹优化实现模型局部形变的可控编辑)

145. SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting

SOGS:高阶锚点的改进型3D高斯splatting
(二阶统计量(如协方差)优化高斯分布的空间锚定)

146. LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition

LidarGait++:激光雷达点云局部特征与尺寸感知的3D步态识别
(结合人体尺寸信息提升跨视角步态识别精度)

147. SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting

SpecTRe-GS:3D高斯splatting中光线追踪建模高光表面与近邻反射
(处理金属/镜面等高反射材质的渲染技术)

148. MUSt3R: Multi-view Network for Stereo 3D Reconstruction

MUSt3R:立体3D重建的多视图网络
(多视图立体视觉(MVS)的深度学习模型)

149. iSegMan: Interactive Segment-and-Manipulate 3D Gaussians

iSegMan:交互式分割-操作3D高斯模型
(用户交互选择并编辑高斯单元的工具)

150. LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

LayoutVLM:视觉语言模型的可微优化3D布局生成
(通过文本指令微分优化场景中物体的空间布局)

151. MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

MoDec-GS:动态3D高斯splatting的全局-局部运动分解与时间间隔调整
(分层建模运动(全局刚体+局部形变)并优化时序采样间隔)

152. Towards In-the-wild 3D Plane Reconstruction from a Single Image

真实场景下单图像3D平面重建研究
(从非结构化单幅图像中提取平面几何结构,如墙面/地面)

153. Material Anything: Generating Materials for Any 3D Object via Diffusion

Material Anything:扩散模型生成任意3D物体材质
(类似SAM(Segment Anything Model)的泛化材质生成框架)

154. Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation

Sharp-It:多视图到多视图扩散模型的3D合成与操作
(输入多视图图像生成可编辑的3D模型,保持视角一致性)

155. SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

SpatialLLM:空间智能大模态模型的3D先验复合设计
(将3D几何知识注入大语言模型,增强空间推理能力)

156. HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics

HUSH:球谐函数全景3D场景整体理解
(利用球谐函数表示全景图像的3D几何与光照信息)

157. g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

g3D-LF:具身任务泛化3D语言特征场
(结合语言指令与3D特征场的智能体交互模型)

158. DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing

DaCapo:分数蒸馏栈桥的快速高质量3D编辑
(通过蒸馏预训练扩散模型加速3D模型编辑过程)

159. ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting

ArticulatedGS:自监督3D高斯splatting铰接物体数字孪生建模
(无标注视频中自动学习可活动物体的关节运动模型)

160. ARM: Appearance Reconstruction Model for Relightable 3D Generation

ARM:可重光照3D生成的外观重建模型
(分离几何与光照的生成模型,支持任意光照条件渲染)

161. CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner

CraftsMan3D:3D原生扩散与交互式几何精修的高保真网格生成
(端到端扩散模型直接生成网格,支持用户交互式优化)

162. MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

MVGenMaster:3D先验增强扩散模型的任意图像多视图生成
(利用3D几何先验提升单图生成多视角图像的一致性)

163. Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

基于视线跟随弱监督的真实场景3D视线估计增强
(利用"看同一物体"的弱标签训练跨视角视线预测模型)

164. BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

BIP3D:具身智能的2D图像与3D感知桥接
(建立2D视觉与3D空间的跨模态映射,支持机器人导航)

165. Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation

Acc3D:边缘一致性引导分数蒸馏的单图像到3D扩散模型加速
(利用边缘特征对齐提升扩散模型的重建速度与精度)

166. LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

LesionLocator:3D全身成像零样本通用肿瘤分割与跟踪
(无需特定肿瘤类型标注的泛化医学影像分析模型)

167. GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

GenPC:3D生成先验零样本点云补全
(利用预训练生成模型填补点云缺失区域,支持未知类别)

168. Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Video-3D LLM:位置感知视频表示学习的3D场景理解
(从视频中提取带空间位置信息的特征,用于3D场景建模)

169. 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

3D凸面splatting:3D平滑凸面辐射场渲染
(用凸面体素替代传统球体,提升辐射场建模效率)

170. 3D Prior Is All You Need: Cross-Task Few-shot 2D Gaze Estimation

仅需3D先验:跨任务少样本2D视线估计
(利用3D头部模型先验,解决2D图像少样本视线预测问题)

171. IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

IMFine:几何引导多视图精修的3D修复
(结合多视角图像与几何约束填补3D模型缺失区域)

172. Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding

开放词汇3D场景理解的跨模态不确定性感知聚合
(融合文本-图像-点云多模态数据,并建模预测不确定性)

173. Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Kiss3DGen:图像扩散模型复用的3D资产生成
(迁移2D图像生成模型权重至3D场景,降低训练成本)

174. CrossOver: 3D Scene Cross-Modal Alignment

CrossOver:3D场景跨模态对齐
(对齐点云-图像-文本的语义空间,支持跨模态检索)

175. MESC-3D: Mining Effective Semantic Cues for 3D Reconstruction from a Single Image

MESC-3D:单图像3D重建的有效语义线索挖掘
(从图像语义标签(如"椅子"“桌子”)提取几何先验)

176. Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

真实室内空间开放词汇功能3D场景图
(支持任意物体类别与功能关系的场景图建模,如"桌子-放置-水杯")

177. Light Transport-aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes

光传输感知扩散后验采样的单视图3D体素重建
(考虑光线传播路径的扩散模型,提升透明/半透明物体重建精度)

178. LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning

LAL:延迟感知辅助学习的3D人体运动预测增强
(针对机器人控制延迟的时序运动预测优化)

179. CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation

CoSER:3D创作一致性密集多视图文本到图像生成器
(生成同一文本描述的多视角图像,支持后续3D建模)

180. MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models

MonoSplat:单目深度基础模型泛化3D高斯splatting
(基于单目深度估计模型的无姿态3D重建方案)

181. StickMotion: Generating 3D Human Motions by Drawing a Stickman

StickMotion:画火柴人生成3D人体动作
(用户手绘简笔画驱动的3D动作生成工具)

182. GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking

GS-DiT:高效密集3D点跟踪动态高斯场的视频生成
(用高斯场表示动态物体,结合点跟踪提升视频时序一致性)

183. Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Prometheus:前馈文本到3D场景生成的3D感知隐扩散模型
(非迭代式扩散模型,直接生成可渲染的3D场景表示)

184. SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

SpatialCLIP:空间判别语言学习3D感知图像表示
(CLIP模型扩展至空间语义理解,如"左""右"方位词)

185. Escaping Plato’s Cave: Towards the Alignment of 3D and Text Latent Spaces

走出柏拉图洞穴:3D与文本隐空间对齐研究
(哲学隐喻,指建立视觉3D空间与语言语义空间的跨模态映射)

186. STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

STCOcc:稀疏时空级联更新的3D占据与场景流预测
(处理稀疏传感器数据的时空联合建模方法)

187. Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Horizon-GS:空-地大规模场景统一3D高斯splatting
(融合无人机航拍与地面影像的无缝场景重建)

188. Mitigating Ambiguities in 3D Classification with Gaussian Splatting

高斯splatting缓解3D分类歧义性
(利用高斯分布的概率特性处理点云分类中的模糊样本)

189. Neuro-3D: Towards 3D Visual Decoding from EEG Signals

Neuro-3D:脑电信号3D视觉解码研究
(从脑电图(EEG)中重建受试者观察的3D场景)

190. BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion

BrepGiff:3D图注意力扩散的轻量化复杂B-rep生成
(B-rep=边界表示法,用于精确几何建模的扩散模型)

191. GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations

GazeGene:带3D眼球标注的大规模合成视线数据集
(包含眼球三维结构标注的合成数据,用于视线估计训练)

192. PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting

PUP 3D-GS:3D高斯splatting的原理性不确定性剪枝
(基于预测不确定性剔除冗余高斯单元,优化模型效率)

193. FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity

FreeGave:高斯速度场动态视频3D物理学习
(从视频中提取物体运动速度的高斯分布,建模物理动力学)

194. VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

VISTA3D:3D医学影像统一分割基础模型
(支持多模态(CT/MRI)的预训练分割模型,适配多种器官)

195. OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging

OnlineAnySeg:视觉基础模型引导2D掩码融合的在线零样本3D分割
(实时处理视频流,通过2D分割掩码推断3D语义结构)

196. Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB

模糊激光雷达生成清晰3D:漫反射激光雷达与RGB的鲁棒手持3D扫描
(利用漫反射激光雷达的低噪声特性提升手持扫描精度)

197. Textured Gaussians for Enhanced 3D Scene Appearance Modeling

纹理高斯增强3D场景外观建模
(为高斯单元添加纹理映射,提升重建场景的真实感)

198. Activating Sparse Part Concepts for 3D Class Incremental Learning

稀疏部件概念激活的3D类别增量学习
(仅更新与新类别相关的部件特征,避免灾难性遗忘)

199. D³CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation

D³CTTA:领域相关去相关的3D激光雷达分割持续测试时自适应
(在线适应不同场景的域迁移技术,D³=Domain-Dependent Decorrelation)

200. GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping

GaussHDR:统一3D与2D局部色调映射学习的高动态范围高斯splatting
(处理高动态光照场景的渲染技术,避免过曝/欠曝问题)

201. SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation

SAM2Object:基于SAM2的视图一致性增强零样本3D实例分割
(利用Segment Anything Model的升级版SAM2实现跨视图实例分割一致性)

202. SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

SelfSplat:无姿态无3D先验的泛化3D高斯splatting
(完全无监督的3D重建,不依赖相机姿态或几何先验知识)

203. Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph

基于Delaunay图最小二乘法的3D点云法向量方向一致性优化
(解决点云法向量方向歧义问题,Delaunay图用于构建点云拓扑)

204. AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

AC3D:视频扩散Transformer中3D相机控制分析与改进
(研究扩散模型中相机参数(如视角/位置)的可控性)

205. Dual Exposure Stereo for Extended Dynamic Range 3D Imaging

双曝光立体视觉扩展动态范围3D成像
(通过不同曝光图像融合提升高动态场景(如强光/阴影)的深度精度)

206. SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

SceneFactor:因子化隐变量3D扩散的可控场景生成
(将场景分解为物体、光照、材质等独立隐变量,支持参数化控制)

207. SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations

SemAlign3D:基于3D物体类别表示对齐的RGB图像语义对应
(跨视图图像通过共享3D物体语义空间实现特征对齐)

208. 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

深度引导跨视图一致的3D高斯修复
(利用深度图约束填补高斯模型缺失区域,保持多视图一致性)

209. Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

拨开3D视觉语言理解的迷雾:基于分析链的以物体为中心评估
(提出面向物体交互的评估框架,如"找出杯子旁边的椅子")

210. 2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy

2V3D:光场显微视图间去噪3D重建
(光场显微镜数据的跨视图联合去噪与三维重建)

211. VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors

VideoHandles:视频生成先验编辑视频中3D物体组合
(利用预训练视频扩散模型修改场景中的物体位置/类别)

212. BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment

BOE-ViT:自监督3D子断层对齐中利用等变性增强方向估计
(ViT=视觉Transformer,等变性确保旋转不变性特征学习)

213. PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

PSHuman:跨尺度多视图扩散与显式网格重划分的单图像照片级真实感3D人体重建
(结合多尺度扩散模型与网格优化提升重建保真度)

214. CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

CAD-Llama:利用大语言模型生成参数化CAD 3D模型
(自然语言指令生成可编辑的参数化三维模型,如机械零件)

215. DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers

DUNE:异构2D/3D教师模型蒸馏通用编码器
(融合图像与点云预训练模型,生成统一跨模态特征表示)

216. MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

MARVEL-40M+:多层次视觉精细化的高保真文本到3D内容生成
(40M+参数模型,分层次(全局布局→局部细节)生成3D场景)

217. MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

MegaSynth:合成数据规模化提升3D场景重建
(利用大规模合成数据集训练泛化性更强的重建模型)

218. WonderWorld: Interactive 3D Scene Generation from a Single Image

WonderWorld:单图像交互式3D场景生成
(用户通过点击/拖拽单幅图像生成可交互的3D场景漫游)

219. DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

DSPNet:双视觉场景感知的鲁棒3D问答
(融合双目视觉特征提升场景理解的问答准确性)

220. Turbo3D: Ultra-fast Text-to-3D Generation

Turbo3D:超快速文本到3D生成
(实时或亚秒级文本生成3D模型的高效框架)

221. TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer

TexGarment:高效3D结构引导扩散Transformer的一致服装UV纹理生成
(根据服装3D骨骼结构生成连续UV贴图,避免纹理扭曲)

222. Empowering Large Language Models with 3D Situation Awareness

赋予大语言模型3D态势感知能力
(将3D空间几何关系注入LLM,如"椅子在桌子左边")

223. Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding

Ges3ViG:具身指称理解中融合手势的语言基3D视觉定位
(结合手势视频与文本指令定位3D场景中的物体,如"指向的杯子")

224. Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes

密集散射结构光的动态场景高光谱3D成像
(结构光编码技术结合高光谱相机,实现动态物体三维重建)

225. 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

原型感知视图变换的低分辨率查询3D占据预测
(利用原型特征(如汽车、行人)提升低分辨率输入的预测精度)

226. DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

DoF-Gaussian:3D高斯splatting的可控景深渲染
(模拟相机景深效果,支持焦点区域与虚化控制)

227. Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction

生成式致密化:高斯密度学习的高保真泛化3D重建
(自动增加高斯数量以填补细节,提升复杂结构重建质量)

228. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

MVPaint:同步多视图扩散绘制任意3D物体
(用户绘制单视图草图,扩散模型生成多视图一致的3D模型)

229. Generating 3D-Consistent Videos from Unposed Internet Photos

无姿态互联网照片生成3D一致视频
(从非结构化网络照片序列中提取相机姿态,生成时空一致的3D视频)

230. TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

TaoAvatar:3D高斯splatting的增强现实实时逼真全身说话人头
(支持实时渲染的全身动态化身,适用于AR交互)

231. Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Speedy-Splat:稀疏像素与基元的快速3D高斯splatting
(仅处理关键像素和基础几何基元,加速重建与渲染)

232. HandOS: 3D Hand Reconstruction in One Stage

HandOS:单阶段3D手重建
(端到端模型直接从图像生成手部三维网格,无需中间步骤)

233. Hash3D: Training-free Acceleration for 3D Generation

Hash3D:无训练加速的3D生成
(利用哈希表查询预计算特征,加速扩散模型推理)

234. Lifting Motion to the 3D World via 2D Diffusion

通过2D扩散将运动提升至3D世界
(从2D视频动作预测对应的3D人体运动轨迹)

235. EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

EditSplat:多视图融合与注意力引导优化的视图一致3D场景编辑(高斯splatting)
(用户编辑单视图后,模型自动同步更新多视图高斯分布)

236. 3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping

3D-SLNR:大规模3D建图的超轻量神经表示
(低参数模型实现实时三维环境建模,适用于移动设备)

237. MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection

MonoPlace3D:单目3D检测的3D感知物体放置学习
(预测物体在单目图像中的三维空间位置与朝向)

238. PICO: Reconstructing 3D People In Contact with Objects

PICO:人与物体交互的3D重建
(捕捉人体与物体接触时的几何形变,如"握杯子的手")

239. GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

GREAT:开放词汇3D物体功能定位的几何-意图协同推理
(结合物体几何形状与用户意图(如"可坐的物体")定位语义区域)

240. Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

语言指令-视觉观察-交互的3D物体功能定位
(多模态输入(文本+图像+动作)定位物体功能区域,如"可抓取的把手")

241. Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection

利用时序线索的半监督多视图3D目标检测
(结合视频帧间运动信息提升未标注数据的检测性能)

242. SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

SplatFlow:多视图校正流模型的3D高斯splatting合成
(通过光流校正跨视图运动,提升动态场景重建精度)

243. Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Inst3D-LMM:实例感知的多模态指令调优3D场景理解
(支持"指出红色汽车"等实例级查询的多模态(文本+图像)模型)

244. DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting

DiET-GS:扩散先验与事件流辅助去运动模糊的3D高斯splatting
(结合事件相机数据去除动态模糊,提升重建清晰度)

245. Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Change3D:视频建模视角的变化检测与描述
(从视频序列中检测物体变化并生成自然语言描述)

246. 3D Student Splatting and Scooping

3D学生:高斯splatting与特征提取
(可能为教育领域或轻量化模型的3D重建技术,保留原标题隐喻)

247. ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

ASIGN:解剖学感知空间插补图网络的3D空间转录组学
(生物医学领域,从基因表达数据重建三维组织解剖结构)

248. SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

SGFormer:卫星-地面融合的3D语义场景补全
(融合卫星遥感与地面激光雷达数据,补全大规模场景语义信息)

249. PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding

PanoGS:高斯全景分割的3D开放词汇场景理解
(同时分割物体实例与背景语义(如"天空"“草地”)的开放词汇模型)

250. SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection

SOAP:视觉中心的场景自适应解码与遮挡区域感知视图投影3D语义场景补全
(根据场景内容动态调整解码器,提升遮挡区域重建精度)

251. Learning Partonomic 3D Reconstruction from Image Collections

基于图像集合的部件分层3D重建学习
(Partonomic指"部件分层",如将物体分解为"头部-身体-四肢"进行层级重建)

252. High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model

基于RGBN体高斯重建模型的单图像高保真3D物体生成
(RGBN包含法向量信息的四通道图像,提升几何重建精度)

253. Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

分层生成链在复杂属性文本到3D生成中的应用
(分阶段生成(全局形状→局部细节→材质)处理多属性文本指令)

254. Functionality Understanding and Segmentation in 3D Scenes

3D场景功能理解与分割
(识别场景中物体的功能区域,如"椅子的座位部分可坐")

255. MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction

MAC-Ego3D:多智能体高斯共识的实时协同自运动与照片级真实感3D重建
(多个传感器节点通过高斯分布共识优化相机位姿与场景重建)

256. How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

How Do I Do That?:日常交互3D手部动作与接触合成
(生成手部操作物体的三维运动轨迹与接触点,如"握杯子")

257. ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

ODHSR:单目视频在线稠密人体与场景3D重建
(实时处理单目视频流,同步重建人体与环境几何)

258. GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking

GRAE-3DMOT:几何关系感知编码器的在线3D多目标跟踪
(建模物体间空间关系(如遮挡、距离)提升跟踪鲁棒性)

259. 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

3DEnhancer:多视图一致扩散的3D增强
(通过扩散模型修复多视图重建中的几何不一致问题)

260. SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

SpectroMotion:高光场景动态3D重建
(处理金属、镜面等强反射材质的动态物体三维重建)

261. ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

ShapeWords:3D形状感知提示引导的文本到图像合成
(在文本提示中加入几何形状约束(如"立方体形状的机器人"))

262. Insightful Instance Features for 3D Instance Segmentation

3D实例分割的洞察性实例特征
(设计区分度高的实例级特征,解决点云分割中的类别混淆)

263. 3D-GSW: 3D Gaussian Splatting for Robust Watermarking

3D-GSW:鲁棒水印的3D高斯splatting
(在高斯模型中嵌入不可见水印,用于版权保护)

264. Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

渐进渲染蒸馏:无3D数据适配Stable Diffusion的实时文本到网格生成
(蒸馏2D图像生成模型权重,直接生成3D网格)

265. Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization

Morpheus:文本驱动3D高斯splat形状与色彩风格化
(根据文本指令(如"卡通风格"“赛博朋克配色”)修改高斯模型外观)

266. RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

RaCFormer:基于查询的雷达-相机融合高质量3D目标检测
(Transformer架构下动态查询跨模态特征融合,提升检测精度)

267. DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal

DOF-GS:可调景深的3D高斯splatting(支持后期重聚焦、散焦渲染与去模糊)
(渲染时动态调整焦点,模拟相机光圈效果)

268. Coherent 3D Portrait Video Reconstruction via Triplane Fusion

三平面融合的连贯3D人像视频重建
(Triplane技术结合视频时序信息,生成表情一致的动态人像)

269. UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

UniPre3D:跨模态高斯splatting的3D点云模型统一预训练
(融合图像、点云多模态数据预训练通用特征提取器)

270. G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

G3Flow:姿态感知泛化物体操作的生成式3D语义流
(根据输入姿态生成物体形变的三维流场,如"弯曲的手臂")

271. Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking

Any3DIS:2D掩码跟踪的无类别3D实例分割
(通过2D视频掩码序列推断对应的3D实例分割结果)

272. GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector

GO-N3RDet:几何优化NeRF增强的3D目标检测器
(结合神经辐射场(NeRF)提升遮挡物体的检测精度)

273. VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

VasTSD:血管树状态空间扩散模型的血管造影合成
(医学影像领域,生成血管三维树状结构的扩散模型)

274. Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Luminance-GS:视图自适应曲线调整的强光环境3D高斯splatting
(根据视角动态调整光照曲线,解决过曝/欠曝问题)

275. LT3SD: Latent Trees for 3D Scene Diffusion

LT3SD:隐树结构的3D场景扩散
(用树状隐变量表示场景分层结构(如"房间-家具-物品"))

276. DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

DIFIX3D+:单步扩散模型改进3D重建
(轻量级扩散模型单次推理修复重建中的几何缺陷)

277. LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds

LogoSP:超点局部-全局分组的无监督3D点云语义分割
(超点(Superpoint)指聚类后的点云块,通过跨块关联实现语义分组)

278. Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion

跨模态蒸馏的2D运动多物体2D/3D发现
(从2D视频运动轨迹中蒸馏出三维物体的存在与运动)

279. GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency

GEAL:跨模态一致性泛化3D功能学习
(融合文本、图像、点云多模态数据学习物体功能(如"可抓握"))

280. Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction

Touch2Shape:触觉条件3D扩散的形状探索与重建
(通过触觉传感器数据引导扩散模型生成物体三维形状)

281. Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving

Spotting the Unexpected (STU):自动驾驶异常分割的3D激光雷达数据集
(包含道路异常(如坑洼、障碍物)的标注数据集)

282. BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering

BG-Triangle:贝塞尔高斯三角形的3D矢量化与渲染
(用贝塞尔曲线参数化高斯三角形,提升几何建模灵活性)

283. Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space

Common3D:神经特征空间常见物体3D可变形模型自监督学习
(学习椅子、汽车等常见物体的可变形潜空间,支持姿态/形状生成)

284. The Language of Motion: Unifying Verbal and Non-Verbal Language of 3D Human Motion

运动的语言:统一3D人体运动的语言与非语言表达
(将动作描述文本与肢体语言(如手势)映射至三维运动轨迹)

285. StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

StdGEN:语义分解的单图像3D角色生成
(将角色分解为"身体-服装-发型"等语义部件分别生成)

286. GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting

GaussianUDF:3D高斯splatting推断无符号距离函数
(从高斯分布中提取隐式几何表示,用于网格生成)

287. EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

EffiDec3D:高性能高效3D医学图像分割优化解码器
(轻量化解码器设计,适配实时医学影像分割需求)

288. SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens

SAT-HMR:尺度自适应Token的实时多人3D网格估计
(Transformer中动态调整Token尺度,处理不同距离的人体)

289. GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

GEN3C:3D先验世界一致的精准相机控制视频生成
(生成具有统一3D空间关系的视频,支持相机路径定制)

290. ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points

ArcPro:稀疏点结构化3D抽象的建筑程序
(从激光雷达稀疏点云生成建筑结构的参数化模型)

291. CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes

CorrBEV:多模态原型关联学习的多视图3D目标检测
(在BEV(鸟瞰图)中通过跨模态原型匹配提升检测泛化性)

292. PhysGen3D: Crafting a Miniature Interactive World from a Single Image

PhysGen3D:单图像构建微型交互式3D世界
(生成可交互的微缩场景,如"桌上的科幻城市")

293. Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting

高斯splatting中端到端2D至3D场景分割再思考
(重新设计从单目图像到3D场景分割的端到端流水线)

294. Continuous 3D Perception Model with Persistent State

带持续状态的连续3D感知模型
(维护场景状态(如物体位置)的时序连续性,适用于机器人导航)

295. HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

HybridGS:2D与3D高斯splatting解耦动态与静态场景
(静态背景用3D高斯表示,动态物体用2D视频特征跟踪)

296. A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions

基于局部形状函数的轻量级UDF学习3D重建框架
(UDF=无符号距离函数,局部建模提升计算效率)

297. DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

DriveGEN:可控文本到图像扩散生成的泛化鲁棒驾驶3D检测
(生成罕见场景(如暴雨中的车辆)增强检测模型鲁棒性)

298. SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

SKDream:任意骨架可控多视图与3D生成
(用户自定义骨架结构(如"昆虫的六条腿")生成对应3D模型)

299. Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance

Arc2Avatar:身份引导的单图像表情丰富3D化身生成
(保留输入人像身份特征(如面部轮廓)的动态化身)

300. SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

SeeGround:零样本开放词汇3D视觉定位
(无需类别标注,根据文本描述(如"红色圆柱形物体")定位3D场景中的区域)

301. IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos

IM-Portrait:单目视频3D感知视频扩散的照片级真实感说话人头生成
(结合3D头部几何模型的视频扩散模型,实现唇形同步的动态人像重建)

302. Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

基于3DMM引导扩散的多风格人脸识别数据合成
(利用三维可变形模型(3DMM)生成不同光照、姿态、表情的人脸数据)

303. DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

DiffLocks:扩散模型单图像生成3D头发
(通过扩散模型从单张人像图生成具有空间层次感的三维毛发结构)

304. DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

DualPM:双姿态-规范点图的3D形状与姿态重建
(通过当前姿态点图与规范姿态点图的双向映射,解耦形状与姿态估计)

305. Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution

Volume Tells:双循环一致扩散的3D荧光显微图像去噪与超分辨率
(生物医学领域,利用循环扩散网络提升荧光显微镜数据的信噪比和分辨率)

306. GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection

GBlobs:高斯Blob显式局部结构的跨域激光雷达3D目标检测增强
(用高斯分布建模点云局部几何结构,提升跨传感器(如不同激光雷达型号)的检测泛化性)

307. Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition

遮挡感知文本-图像-点云预训练的开放世界3D目标识别
(多模态预训练模型学习遮挡鲁棒特征,支持未知类别物体的检测与识别)

308. DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

DirectTriGS:基于三平面的高斯splatting场表示3D生成
(结合Triplane(三平面特征网格)与高斯 splatting,提升生成模型的视角一致性和几何细节)

309. Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Doppelgangers++:几何3D特征增强的视觉歧义消解
(通过三维几何特征(如体积、对称性、凹凸性)区分视觉相似物体,如杯子与花瓶)

310. Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

掩码场景建模:缩小3D场景理解中监督学习与自监督学习的性能差距
(通过掩码预测任务模拟监督学习的语义特征学习,提升自监督模型的场景理解能力)

311. Recovering Dynamic 3D Sketches from Videos

从视频中恢复动态3D草图
(提取视频中物体运动的三维骨架或轮廓表示,适用于动作分析与动画生成)

312. WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

WiLoR:真实场景端到端3D手部定位与重建
(无需特殊设备,直接从自然场景图像中输出手部三维网格及空间位置)

313. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

MASt3R-SLAM:3D重建先验的实时稠密SLAM
(利用预训练的3D物体模型先验加速同步定位与地图构建,提升动态场景鲁棒性)

314. BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects

BimArt:铰接物体3D双手交互合成统一方法
(模拟双手与可活动物体(如门把手、工具)的交互过程,生成动力学一致的三维动作序列)

315. Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields

Thin-Shell-SfT:神经变形场细粒度单目非刚性3D表面跟踪
(针对薄壳结构(如布料、纸张)的单目视频跟踪,捕捉褶皱、拉伸等细微形变)

316. Seeing A 3D World in A Grain of Sand

沙粒中的3D世界
(隐喻通过显微成像或小尺度重建技术,从微观视角构建三维场景)

317. SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

SplineGS:单目视频实时动态3D高斯的鲁棒运动自适应样条
(使用样条曲线建模高斯单元的运动轨迹,适应快速运动场景的重建需求)

318. LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

LSceneLLM:自适应视觉偏好增强的大型3D场景理解
(大语言模型结合动态视觉注意力,优先解析场景中关键物体(如人物、车辆)的语义关系)

319. One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency

基于几何语义一致性的单样本3D物体规范化解构
(仅需单张图像,通过几何对齐(如对称面)和语义匹配(如类别标签)生成物体的标准姿态三维模型)

320. Prior-free 3D Object Tracking

无先验3D目标跟踪
(不依赖物体类别、形状或运动模式先验知识的通用目标跟踪算法,适用于未知物体)

321. 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

3D-GRAND:百万级3D-LLM数据集(强语义接地性与低幻觉)
(包含文本描述与对应3D场景的大规模数据集,提升语言模型对空间关系的准确理解)

322. PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

PartGen:多视图扩散模型的部件级3D生成与重建
(将复杂物体分解为部件(如汽车的"车身-车轮-车窗"),分阶段生成并组装三维模型)

323. PerLA: Perceptive 3D Language Assistant

PerLA:感知型3D语言助手
(结合3D场景理解的对话系统,支持"描述茶几上的红色物体"等空间查询与指令执行)

324. Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

真实场景单图像零样本鲁棒3D形状重建
(无需特定类别训练数据,直接从自然图像中泛化重建陌生物体的三维形状)

325. Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

视角迁移:驾驶场景任意视角可控3D生成
(根据用户指定的视角(如鸟瞰、第一人称)生成一致的三维驾驶场景,支持虚拟仿真)

326. Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

Dr. Splat:直接语言嵌入配准的3D高斯splatting
(通过文本指令直接关联高斯单元的语义标签与空间位置,实现"将蓝色高斯移动到沙发左侧"等操作)

327. FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting

FruitNinja:高斯splatting的3D物体内部纹理生成
(模拟切开物体的内部结构与纹理(如果肉、果肉纤维),支持透明或半透明材质的渲染)

328. Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

各向异性高斯扩散的真实感3D人体运动预测
(使用各向异性高斯分布建模关节运动的方向依赖性,如膝关节只能在特定平面弯曲)

329. Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality

Flash3D:联合硬件-几何局部性的超大规模点Transformer
(优化Transformer架构以利用GPU的局部计算特性,加速处理百万级点云的三维任务)

330. Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion

阴影与运动结合:自监督室内3D重建的同时明暗恢复形状与运动恢复结构
(融合光度立体(从阴影推断形状)与运动视差(从帧间运动推断结构)的自监督学习框架)

331. SfM-Free 3D Gaussian Splatting via Hierarchical Training

无SfM分层训练的3D高斯splatting
(无需传统运动恢复结构(SfM)流程,通过分层训练直接从图像序列生成高斯 splatting 模型)

332. RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

RICCARDO:雷达点预测卷积的相机-雷达3D目标检测
(通过雷达点云预测补充视觉深度信息,提升雨雾等低光照场景的检测精度)

333. Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Relation3D:点云实例分割的关系建模增强
(通过图神经网络建模点云中各点的语义关系(如"是否属于同一物体"),提升分割准确性)

334. FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection

FASTer:焦点Token获取与缩放Transformer的长时3D目标检测
(动态调整Transformer中的Token注意力权重,优化长时间跨度下的目标跟踪与检测)

335. SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

SplatAD:自动驾驶3D高斯splatting实时激光雷达与相机渲染
(融合激光雷达点云和相机图像的实时渲染框架,支持驾驶场景的环境感知与可视化)

336. Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection

2D旋转对称检测中利用3D几何先验
(从2D图像中识别具有三维旋转对称性的物体(如圆柱、球体),辅助姿态估计与形状重建)

337. DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

DeClotH:单图像可分解3D衣物与人体重建
(将人体与衣物的三维重建解耦,支持服装虚拟试穿、姿态不变的衣物编辑等应用)

338. Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation

极端光照变化下生成式多视图重光照3D重建
(通过生成模型模拟不同光照条件下的多视图图像,增强重建算法在强光/阴影场景的鲁棒性)

339. InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

InteractVLM:2D基础模型的3D交互推理
(将CLIP等2D视觉语言模型扩展至3D场景,支持"找出可以坐下的物体"等交互意图理解)

340. POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality

POp-GS:P-最优性的3D高斯splatting下一个最佳视图选择
(基于信息论中的P-最优准则,主动选择能最大化场景重建信息增益的扫描视角)

341. NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

NexusGS:极线深度先验的稀疏视图合成3D高斯splatting
(利用极线几何约束和深度先验,从少量视图合成缺失视角的高斯分布,提升稀疏数据重建质量)

342. CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections

CrossSDF:横截面薄结构3D重建
(从CT/MRI断层图像中重建血管、神经等薄结构的三维 Signed Distance Function(SDF)模型)

343. RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

RoboSpatial:机器人2D/3D视觉语言模型空间理解训练
(通过空间关系(如"左、右、上方")的文本-图像对,训练模型指导机器人执行空间操作任务)

344. 3D-MVP: 3D Multiview Pretraining for Manipulation

3D-MVP:操作任务的3D多视图预训练
(预训练模型学习多视角物体特征与操作模式,支持机器人抓取、装配等任务的规划)

345. Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Instant3dit:多视图修复的快速3D物体编辑
(用户修改单视图中的物体特征(如拉伸、变色),模型自动修复多视图一致性并生成三维编辑结果)

346. Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

不确定性与多样性结合:室内3D目标检测综合主动学习框架
(结合预测不确定性(如低置信度样本)和样本多样性(如罕见姿态),优化主动数据标注策略)

347. Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression

基于体素动态Token压缩的零样本3D问答
(将3D场景体素化后压缩为动态Token序列,适配语言模型进行零样本场景问答)

348. UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping

UVGS:UV映射重构非结构化3D高斯splatting
(为每个高斯单元分配UV纹理坐标,提升材质贴图的效率与一致性,减少纹理扭曲)

349. Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Pow3R:相机与场景先验赋能的无约束3D重建
(利用相机内参先验(如焦距)和场景语义先验(如"办公室有桌椅"),提升无约束环境下的重建精度)

350. FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

FreeScene:自由提示混合图扩散的3D场景合成
(根据自由文本提示(如"科幻城市中的飞行汽车"),通过图结构建模物体关系并扩散生成三维场景)

351. Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Ouroboros3D:3D感知递归扩散的图像到3D生成
(以衔尾蛇(Ouroboros)为隐喻,通过递归扩散逐步细化3D结构,实现从单图到完整三维模型的生成)

352. S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

S2Gaussian:稀疏视图超分辨率3D高斯splatting
(针对少量低分辨率输入视图,通过超分辨率重建和高斯分布优化生成高细节三维场景)

353. MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors

MonoDGP:解耦查询与几何误差先验的单目3D目标检测
(将目标查询(如类别)与几何误差建模(如深度不确定性)分离,提升单目视觉的三维检测精度)

354. Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression

分层压缩的高效解耦特征3D高斯splatting
(将三维特征分解为全局语义特征与局部几何特征,通过分层压缩降低计算复杂度,提升渲染效率)

355. RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

RENO:3D激光雷达点云实时神经压缩
(基于神经网络的点云压缩算法,在保持几何细节的同时实现实时压缩,适用于自动驾驶数据传输)

356. COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting

COB-GS:边界自适应高斯分裂的3DGS分割清晰物体边界
(在物体边界区域动态分裂高斯单元,通过增加局部高斯密度提升分割边界的清晰度与准确性)

357. HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

HOT3D:第一人称多视图视频3D手与物体跟踪
(结合多视图视觉线索,实时跟踪手部动作及其交互物体的三维空间运动,如"手持咖啡杯行走")

358. GLane3D: Detecting Lanes with Graph of 3D Keypoints

GLane3D:3D关键点图的车道检测
(将车道线表示为三维关键点图结构,通过图神经网络建模关键点间的空间关系,提升复杂路况下的车道检测鲁棒性)

359. AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video

AniGrad:各向异性梯度自适应采样的单目视频3D重建
(根据图像梯度的方向和强度动态调整采样策略,在边缘和纹理丰富区域增加采样密度,提升重建细节)

360. UnCommon Objects in 3D

3D中的非常见物体
(针对日常生活中罕见或异形物体(如工业零件、特殊工具)的三维建模、识别与理解研究)

361. OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities

OmniSplat:前馈式全向图像可编辑3D高斯splatting
(处理全景图像(如360°视频)的实时三维重建,支持交互式编辑高斯单元的位置、颜色及属性)

362. MAD: Memory-Augmented Detection of 3D Objects

MAD:记忆增强的3D目标检测
(引入外部记忆模块存储历史检测到的物体特征,提升对长尾类别(如少样本物体)的检测能力)

363. Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation

Mosaic3D:开放词汇3D分割基础数据集与模型
(支持任意文本输入的三维语义分割模型,如根据"金属材质的圆柱体"等描述分割点云中的对应区域)

364. Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

感知准确的3D说话人头生成:新定义、语音网格表示与评估指标
(提出基于人类感知的真实感评估体系,采用语音驱动的网格变形技术,提升唇形同步与表情自然度)

365. Cubify Anything: Scaling Indoor 3D Object Detection

Cubify Anything:规模化室内3D目标检测
(受SAM(Segment Anything Model)启发,实现对室内场景中任意物体的三维边界框检测与分割)

367. CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

CTRL-D:个性化2D扩散的可控动态3D场景编辑
(利用用户提供的2D图像风格(如卡通、写实)引导三维场景编辑,支持动态物体的运动轨迹与外观控制)

368. Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras

Ev-3DOD:事件相机拓展3D目标检测的时间边界
(利用事件相机的微秒级时间分辨率,提升对高速运动物体(如飞溅物、快速移动物体)的检测帧率)

369. Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Fast3R:单次前向传播千张图像的快速3D重建
(设计高效网络架构,实现对海量图像的并行处理,大幅提升大规模场景重建的速度)

370. RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration

RestorGS:深度感知高斯splatting的高效3D场景修复
(利用深度图引导高斯单元的分布与密度调整,快速修复三维场景中的缺失区域(如遮挡或扫描盲区))

371. Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks

眼见不为实:硬标签3D场景对抗性自然物体优化攻击
(通过优化自然物体的外观(如调整纹理、形状),对3D检测模型实施对抗攻击,验证模型鲁棒性)

372. SDGOCC: Semantic and Depth-Guided Bird’s-Eye View Transformation for 3D Multimodal Occupancy Prediction

SDGOCC:语义与深度引导的鸟瞰图变换3D多模态占据预测
(融合语义标签(如"车辆"“行人”)和深度信息,在鸟瞰图(BEV)中预测三维空间的占据状态)

373. 3D-HGS: 3D Half-Gaussian Splatting

3D-HGS:3D半高斯splatting
(使用半球形高斯分布建模表面朝向相关的光照或几何特征,适用于反射面或各向异性材质的重建)

374. MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

MonoTAKD:单目3D目标检测的助教知识蒸馏
(通过教师-学生架构,利用多模态(如双目、激光雷达)模型的知识蒸馏提升单目模型的检测性能)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值