CVPR2023论文列表（中英对照）

最新推荐文章于 2024-10-08 23:43:23 发布

芷年若相依

最新推荐文章于 2024-10-08 23:43:23 发布

阅读量1.7k

点赞数 7

文章标签：深度学习计算机视觉人工智能神经网络 python 图像处理

本文链接：https://blog.csdn.net/dovings/article/details/137277425

版权

（简易翻译，仅供参考）
“Seeing” Electric Network Frequency From Events 从事件中“看”电网频率
(ML) $^2$ P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning (ML) $^2$ P-Encoder：探索多标签零样本学习的通道类相关性
1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions 1% VS 100%：用于密集预测的参数高效低秩适配器
1000 FPS HDR Video With a Spike-RGB Hybrid Camera 使用 Spike-RGB 混合相机拍摄 1000 FPS HDR 视频
2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection 2PCNet：用于日夜无监督域自适应目标检测的两阶段一致性训练
3D Cinemagraphy From a Single Image 单幅图像的 3D 电影摄影
3D Concept Learning and Reasoning From Multi-View Images 从多视图图像中学习和推理 3D 概念
3D GAN Inversion With Facial Symmetry Prior 具有面部对称先验的 3D GAN 反演
3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions 3D 荧光笔：通过文本描述在 3D 形状上定位区域
3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels 没有人类标签的野外点云的 3D 人类关键点估计
3D Human Mesh Estimation From Virtual Markers 从虚拟标记估计 3D 人体网格
3D Human Pose Estimation via Intuitive Physics 通过直观物理进行 3D 人体姿势估计
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention 具有时空交叉注意力的 3D 人体姿势估计
3D Line Mapping Revisited 重新审视 3D 线映射
3D Neural Field Generation Using Triplane Diffusion 使用三平面扩散的 3D 神经场生成
3D Registration With Maximal Cliques 最大派系的 3D 注册
3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds 野外 3D 语义分割：学习不利条件点云的广义模型
3D Shape Reconstruction of Semi-Transparent Worms 半透明蠕虫的 3D 形状重建
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud 用于点云场景图预测的 3D 空间多模态知识积累
3D Video Loops From Asynchronous Input 来自异步输入的 3D 视频循环
3D Video Object Detection With Learnable Object-Centric Global Optimization 具有可学习的以对象为中心的全局优化的 3D 视频对象检测
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 3DAvatarGAN：个性化可编辑头像的桥接域
3D-Aware Conditional Image Synthesis 3D 感知条件图像合成
3D-Aware Face Swapping 3D感知换脸
3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic Data 通过合成数据的多视图一致训练进行 3D 感知面部地标检测
3D-Aware Multi-Class Image-to-Image Translation With NeRFs 使用 NeRF 进行 3D 感知多类图像到图像转换
3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification 通过同时探索和识别进行 3D 感知对象目标导航
3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture 3D-POP - 一种自动注释方法，通过基于标记的运动捕捉促进自由移动鸟类的无标记 2D-3D 跟踪
3Mformer: Multi-Order Multi-Mode Transformer for Skeletal Action Recognition 3Mformer：用于骨骼动作识别的多阶多模转换器
A Bag-of-Prototypes Representation for Dataset-Level Applications 数据集级应用程序的原型袋表示
A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation 一种基于特征函数的自下而上人体姿态估计方法
A Data-Based Perspective on Transfer Learning 基于数据的迁移学习视角
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction 用于视频预测的动态多尺度体素流网络
A General Regret Bound of Preconditioned Gradient Method for DNN Training 用于 DNN 训练的预条件梯度法的一般遗憾界
A Generalized Framework for Video Instance Segmentation 视频实例分割的通用框架
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction From In-the-Wild Images 用于从野外图像中准确和详细地重建人脸的分层表示网络
A Large-Scale Homography Benchmark 大规模单应性基准
A Large-Scale Robustness Analysis of Video Action Recognition Models 视频动作识别模型的大规模鲁棒性分析
A Light Touch Approach to Teaching Transformers Multi-View Geometry 一种轻松教授变形金刚多视图几何的方法
A Light Weight Model for Active Speaker Detection 一种用于主动说话人检测的轻量级模型
A Loopback Network for Explainable Microvascular Invasion Classification 用于可解释的微血管侵犯分类的环回网络
A Meta-Learning Approach to Predicting Performance and Data Requirements 一种预测性能和数据需求的元学习方法
A New Benchmark: On the Utility of Synthetic Data With Blender for Bare Supervised Learning and Downstream Domain Adaptation 一个新的基准：关于使用 Blender 进行裸监督学习和下游域适应的合成数据的效用
A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation 半监督视频异常检测和预测的新综合基准
A New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories 基于盲人拍摄图像的新数据集，用于测试针对 ImageNet 类别训练的图像分类模型的稳健性
A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning 一条新路径：通过综合指令和模仿学习扩展视觉和语言导航
A Practical Stereo Depth System for Smart Glasses 实用的智能眼镜立体深度系统
A Practical Upper Bound for the Worst-Case Attribution Deviations 最坏情况归因偏差的实际上限
A Probabilistic Attention Model With Occlusion-Aware Texture Regression for 3D Hand Reconstruction From a Single RGB Image 具有遮挡感知纹理回归的概率注意模型，用于从单个 RGB 图像重建 3D 手部
A Probabilistic Framework for Lifelong Test-Time Adaptation 终身测试时间适应的概率框架
A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization 用于稳健高效的视觉惯性初始化的旋转平移解耦解决方案
A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift 具有分组时空偏移的视频恢复的简单基线
A Simple Framework for Text-Supervised Semantic Segmentation 文本监督语义分割的简单框架
A Soma Segmentation Benchmark in Full Adult Fly Brain 完整成年果蝇大脑中的 Soma 分割基准
A Strong Baseline for Generalized Few-Shot Semantic Segmentation 广义小样本语义分割的强基线
A Unified HDR Imaging Method With Pixel and Patch Level 一种像素级和块级统一的HDR成像方法
A Unified Knowledge Distillation Framework for Deep Directed Graphical Models 深度有向图模型的统一知识蒸馏框架
A Unified Pyramid Recurrent Network for Video Frame Interpolation 用于视频帧插值的统一金字塔递归网络
A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance 用于形状和反射率单视图采集的统一空间角度结构光
A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others 打地鼠困境：捷径成倍增加，减轻其中一个会放大其他捷径
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image A2J-Transformer：用于从单个 RGB 图像进行 3D 交互手势估计的锚点到关节变换器网络
ABCD: Arbitrary Bitwise Coefficient for De-Quantization ABCD：用于反量化的任意位系数
ABLE-NeRF: Attention-Based Rendering With Learnable Embeddings for Neural Radiance Field ABLE-NeRF：基于注意力的神经辐射场可学习嵌入渲染
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven’s Progressive Matrices 抽象视觉推理：求解 Raven 渐进矩阵的代数方法
A-Cap: Anticipation Captioning With Commonsense Knowledge A-Cap：具有常识知识的预期字幕
Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses 加速坐标编码：学习使用 RGB 和姿势在几分钟内重新定位
Accelerating Dataset Distillation via Model Augmentation 通过模型增强加速数据集蒸馏
Accelerating Vision-Language Pretraining With Free Language Modeling 使用自由语言建模加速视觉语言预训练
AccelIR: Task-Aware Image Compression for Accelerating Neural Restoration AccelIR：用于加速神经恢复的任务感知图像压缩
Accidental Light Probes 意外光探头
Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning 在持续学习中通过辅助网络实现更好的稳定性-可塑性权衡
ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion ACL-SPC：用于自监督点云完成的自适应闭环系统
ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction ACR：用于任意双手重建的基于注意力协作的回归器
ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation ACSeg：无监督语义分割的自适应概念化
Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition 用于无监督的基于骨架的动作识别的 Actionlet 相关对比学习
Activating More Pixels in Image Super-Resolution Transformer 在 Image Super-Resolution Transformer 中激活更多像素
Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition 积极探索小样本动作识别的多模态互补性
Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm 主动微调：在预训练-微调范例中利用注释预算
ActMAD: Activation Matching To Align Distributions for Test-Time-Training ActMAD：激活匹配以对齐测试时间训练的分布
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders AdaMAE：使用掩码自动编码器进行高效时空学习的自适应掩码
AdamsFormer for Spatial Action Localization in the Future 用于未来空间动作定位的 AdamsFormer
Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition 使用规范化流程调整快捷方式：一种高效的视觉识别调整框架
Adaptive Annealing for Robust Geometric Estimation 稳健几何估计的自适应退火
Adaptive Assignment for Geometry Aware Local Feature Matching 几何感知局部特征匹配的自适应分配
Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity 系统异构下联邦学习的自适应通道稀疏性
Adaptive Data-Free Quantization 自适应无数据量化
Adaptive Global Decay Process for Event Cameras 事件相机的自适应全局衰减过程
Adaptive Graph Convolutional Subspace Clustering 自适应图卷积子空间聚类
Adaptive Human Matting for Dynamic Videos 动态视频的自适应人类抠图
Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo 无纹理弹性多视图立体的自适应补丁变形
Adaptive Plasticity Improvement for Continual Learning 持续学习的自适应可塑性改进
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images 具有全局上下文增强功能的自适应稀疏卷积网络可在无人机图像上更快地进行目标检测
Adaptive Sparse Pairwise Loss for Object Re-Identification 用于对象重新识别的自适应稀疏成对损失
Adaptive Spot-Guided Transformer for Consistent Local Feature Matching 用于一致局部特征匹配的自适应点引导变换器
Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation 用于视觉语言导航的自适应区域感知分层规划器
AdaptiveMix: Improving GAN Training via Feature Space Shrinkage AdaptiveMix：通过特征空间收缩改进 GAN 训练
Adjustment and Alignment for Unbiased Open Set Domain Adaptation 无偏开放集域自适应的调整和对齐
Advancing Visual Grounding With Scene Knowledge: Benchmark and Method 用场景知识推进视觉基础：基准和方法
Adversarial Counterfactual Visual Explanations 对抗性反事实视觉解释
Adversarial Normalization: I Can Visualize Everything (ICE) 对抗性归一化：我可以可视化一切 (ICE)
Adversarial Robustness via Random Projection Filters 通过随机投影滤波器的对抗鲁棒性
Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation 对抗性掩蔽合成模拟真实：点云分割自适应的自适应噪声注入
Adversarially Robust Neural Architecture Search for Graph Neural Networks 图神经网络的对抗性鲁棒神经结构搜索
AeDet: Azimuth-Invariant Multi-View 3D Object Detection AeDet：方位不变多视图 3D 对象检测
Affection: Learning Affective Explanations for Real-World Visual Data 情感：学习真实世界视觉数据的情感解释
Affordance Diffusion: Synthesizing Hand-Object Interactions 可供性扩散：合成手-对象交互
Affordance Grounding From Demonstration Video To Target Image 从演示视频到目标图像的可供性接地
Affordances From Human Videos as a Versatile Representation for Robotics 人类视频的可供性作为机器人技术的多功能表示
AGAIN: Adversarial Training With Attribution Span Enlargement and Hybrid Feature Fusion 再次：具有归因跨度扩大和混合特征融合的对抗训练
A-La-Carte Prompt Tuning (APT): Combining Distinct Data via Composable Prompting A-La-Carte 提示调优 (APT)：通过可组合提示组合不同的数据
Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations Alias-Free Convnets：通过多项式激活实现分数位移不变性
Align and Attend: Multimodal Summarization With Dual Contrastive Losses 对齐并参加：具有双重对比损失的多模式总结
Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models 对齐你的潜在：高分辨率视频合成与潜在扩散模型
AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training AligNeRF：通过对齐感知训练的高保真神经辐射场
Aligning Bag of Regions for Open-Vocabulary Object Detection 对齐区域包以进行开放词汇对象检测
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations 将分步教学图与视频演示对齐
All Are Worth Words: A ViT Backbone for Diffusion Models 一切都值得的话：扩散模型的 ViT 主干
All in One: Exploring Unified Video-Language Pre-Training 多合一：探索统一的视频语言预训练
All-in-Focus Imaging From Event Focal Stack 来自 Event Focal Stack 的全焦点成像
All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations 针对特定退化使用自适应判别滤波器对未知退化进行一体式图像恢复
ALOFT: A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform for Domain Generalization ALOFT：具有用于域泛化的动态低频变换的轻量级类 MLP 架构
ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation 还：通过占用估计进行汽车激光雷达自我监督
AltFreezing for More General Video Face Forgery Detection AltFreezing 用于更通用的视频人脸伪造检测
ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction ALTO：用于隐式 3D 重建的交替潜在拓扑
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection 用于密集对象检测的抗歧义半监督学习
Ambiguous Medical Image Segmentation Using Diffusion Models 使用扩散模型的模糊医学图像分割
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation AMT：用于高效帧插值的全对多场变换
An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Group Activity 群体活动中异步时间推理的以演员为中心的因果关系图
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling 具有遮蔽视觉建模的端到端视频语言转换器的实证研究
An Erudite Fine-Grained Visual Classification Model 博学的细粒度视觉分类模型
An Image Quality Assessment Dataset for Portraits 人像图像质量评估数据集
An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions 换衣条件下行人重识别和步态识别的深入探索
Analyzing and Diagnosing Pose Estimation With Attributions 使用属性分析和诊断姿势估计
Analyzing Physical Impacts Using Transient Surface Wave Imaging 使用瞬态表面波成像分析物理影响
Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection Anchor3DLane：学习回归 3D 锚点以进行单眼 3D 车道检测
AnchorFormer: Point Cloud Completion From Discriminative Nodes AnchorFormer：来自判别节点的点云补全
ANetQA: A Large-Scale Benchmark for Fine-Grained Compositional Reasoning Over Untrimmed Videos ANetQA：针对未修剪视频的细粒度合成推理的大规模基准
Angelic Patches for Improving Third-Party Object Detector Performance 用于提高第三方对象检测器性能的 Angelic 补丁
Annealing-Based Label-Transfer Learning for Open World Object Detection 用于开放世界对象检测的基于退火的标签迁移学习
AnyFlow: Arbitrary Scale Optical Flow With Implicit Neural Representation AnyFlow：具有隐式神经表示的任意尺度光流
Architectural Backdoors in Neural Networks 神经网络中的架构后门
Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning 架构、数据集和模型规模无关的无数据元学习
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation ARCTIC：灵巧的双手对象操作数据集
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning 二进制注释是否足够？通过基于分层不确定性的主动学习进行视频时刻检索
Are Data-Driven Explanations Robust Against Out-of-Distribution Data? 数据驱动的解释是否对分布外数据具有鲁棒性？
Are Deep Neural Networks SMARTer Than Second Graders? 深度神经网络比二年级学生更聪明吗？
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark 我们准备好迎接以视觉为中心的流媒体感知了吗？尽快基准
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data ARKitTrack：使用移动 RGB-D 数据进行跟踪的新型多样化数据集
ARO-Net: Learning Implicit Fields From Anchored Radial Observations ARO-Net：从锚定径向观测中学习隐式场
AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers AShapeFormer：语义引导的对象级主动形状编码，用于通过 Transformers 进行 3D 对象检测
ASPnet: Action Segmentation With Shared-Private Representation of Multiple Data Sources ASPnet：具有多个数据源的共享-私有表示的动作分割
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation AssemblyHands：通过 3D 手姿势估计实现以自我为中心的活动理解
AstroNet: When Astrocyte Meets Artificial Neural Network AstroNet：当星形胶质细胞遇到人工神经网络
AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection AsyFOD：一种用于小样本域自适应对象检测的非对称自适应范式
Asymmetric Feature Fusion for Image Retrieval 用于图像检索的非对称特征融合
Attention-Based Point Cloud Edge Sampling 基于注意力的点云边缘采样
AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation AttentionShift：用于点监督实例分割的迭代估计的基于部分的注意力图
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization 通过潜在代码优化进行属性保留人脸数据集匿名化
AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning AttriCLIP：增量知识学习的非增量学习者
Audio-Visual Grouping Network for Sound Localization From Mixtures 用于混合声音定位的视听分组网络
Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation 增强很重要：一种简单而有效的半监督语义分割方法
AUNet: Learning Relations Between Action Units for Face Forgery Detection AUNet：学习人脸伪造检测动作单元之间的关系
AutoAD: Movie Description in Context AutoAD：上下文中的电影描述
Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-Time Mobile Telepresence Auto-CARD：用于实时移动远程呈现的高效且强大的编解码器头像驱动
AutoFocusFormer: Image Segmentation off the Grid AutoFocusFormer：网格外的图像分割
AutoLabel: CLIP-Based Framework for Open-Set Video Domain Adaptation AutoLabel：基于 CLIP 的开放集视频域自适应框架
Automatic High Resolution Wire Segmentation and Removal 自动高分辨率线分割和去除
Autonomous Manipulation Learning for Similar Deformable Objects via Only One Demonstration 仅通过一次演示对相似可变形物体进行自主操作学习
AutoRecon: Automated 3D Object Discovery and Reconstruction AutoRecon：自动 3D 对象发现和重建
Autoregressive Visual Tracking 自回归视觉追踪
Avatars Grow Legs: Generating Smooth Human Motion From Sparse Tracking Inputs With Diffusion Model 化身长腿：使用扩散模型从稀疏跟踪输入生成平滑的人体运动
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction AVFace：走向详细的视听 4D 人脸重建
AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR AVFormer：将视觉注入冻结语音模型以实现零镜头 AV-ASR
Azimuth Super-Resolution for FMCW Radar in Autonomous Driving 自动驾驶中 FMCW 雷达的方位角超分辨率
BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling BAAM：使用双上下文注意力模块和注意力引导建模的单眼 3D 姿势和形状重建
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption 回到源头：扩散驱动的适应测试时间腐败
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger 通过自适应频率触发对深度图像压缩进行后门攻击
Backdoor Cleansing With Unlabeled Data 使用未标记数据进行后门清理
Backdoor Defense via Adaptively Splitting Poisoned Dataset 通过自适应拆分中毒数据集进行后门防御
Backdoor Defense via Deconfounded Representation Learning 通过去混淆表示学习进行后门防御
BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields BAD-NeRF：束调整的去模糊神经辐射场
BAEFormer: Bi-Directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation BAEFormer：用于鸟瞰图语义分割的双向和早期交互转换器
Balanced Energy Regularization Loss for Out-of-Distribution Detection 分布外检测的平衡能量正则化损失
Balanced Product of Calibrated Experts for Long-Tailed Recognition 用于长尾识别的校准专家的平衡产品
Balanced Spherical Grid for Egocentric View Synthesis 用于自我中心视图合成的平衡球形网格
Balancing Logit Variation for Long-Tailed Semantic Segmentation 平衡长尾语义分割的 Logit 变化
BASiS: Batch Aligned Spectral Embedding Space BASiS：批量对齐的光谱嵌入空间
Batch Model Consolidation: A Multi-Task Model Consolidation Framework 批处理模型整合：一个多任务模型整合框架
Bayesian Posterior Approximation With Stochastic Ensembles 随机集合的贝叶斯后验逼近
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models BBDM：使用布朗桥扩散模型的图像到图像转换
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion BEDLAM：展示详细逼真动画运动的人体合成数据集
Behavioral Analysis of Vision-and-Language Navigation Agents 视觉和语言导航代理的行为分析
Behind the Scenes: Density Fields for Single View Reconstruction 幕后花絮：单视图重建的密度场
Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training 存在来自非存在：通过无言训练生成开放式词汇文本到动作
Benchmarking Robustness of 3D Object Detection to Common Corruptions 3D 对象检测对常见损坏的鲁棒性基准测试
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets 对不同病理数据集的自我监督学习进行基准测试
Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data 两全其美：表格和图像数据的多模态对比学习
Better “CMOS” Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution 更好的“CMOS”产生更清晰的图像：学习盲图像超分辨率的空间变化模糊估计
BEV@DC: Bird’s-Eye View Assisted Training for Depth Completion BEV@DC：深度完成的鸟瞰图辅助训练
BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision BEVFormer v2：通过透视监督使现代图像主干适应鸟瞰图识别
BEV-Guided Multi-Modality Fusion for Driving Perception BEV 引导的多模态融合驱动感知
BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection BEVHeight：基于视觉的路边 3D 对象检测的强大框架
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points BEV-LaneDet：基于关键点虚拟相机的高效 3D 车道检测
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks BEV-SAN：通过切片注意网络进行准确的 BEV 3D 对象检测
Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks 超越外观：用于以人为中心的视觉任务的语义可控自监督学习框架
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers 超越 Attentive Tokens：将 Token 的重要性和多样性结合到高效的视觉转换器中
Beyond mAP: Towards Better Evaluation of Instance Segmentation 超越 mAP：更好地评估实例分割
Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection Bi3D：用于跨域 3D 对象检测的双域主动学习
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures Pruned Vision 模型中的偏差：深度分析与对策
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation 偏差模拟：一种用于偏差缓解的简单采样方法
BiasAdv: Bias-Adversarial Augmentation for Model Debiasing BiasAdv：用于模型去偏的偏差对抗增强
BiasBed - Rigorous Texture Bias Evaluation BiasBed - 严格的纹理偏差评估
Bias-Eliminating Augmentation Learning for Debiased Federated Learning 去偏联邦学习的消除偏差增强学习
BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency BiCro：通过双向跨模态相似性一致性对多模态数据进行噪声对应校正
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation 用于半监督医学图像分割的双向复制粘贴
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models 使用预训练视觉语言模型进行视频识别的双向跨模态知识探索
Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning 用于转导式零样本学习的双向分布对齐
Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining 用于超高分辨率病理图像虚拟重染色的双向特征融合生成对抗网络
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation BiFormer：通过用于 4K 视频帧插值的双边变换器学习双边运动估计
BiFormer: Vision Transformer With Bi-Level Routing Attention BiFormer：具有双层路由注意力的视觉转换器
Bilateral Memory Consolidation for Continual Learning 持续学习的双边记忆巩固
Bi-Level Meta-Learning for Few-Shot Domain Generalization 用于小样本域泛化的双层元学习
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection Bi-LRFusion：用于 3D 动态目标检测的双向 LiDAR-雷达融合
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis 用于高效点云分析的稀疏卷积网络二值化
Binary Latent Diffusion 二元潜扩散
Biomechanics-Guided Facial Action Unit Detection Through Force Modeling 通过力建模进行生物力学引导的面部动作单元检测
BioNet: A Biologically-Inspired Network for Face Recognition BioNet：一种用于人脸识别的仿生网络
BITE: Beyond Priors for Improved Three-D Dog Pose Estimation BITE：超越先验以改进三维狗姿势估计
Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization 位收缩：限制瞬时清晰度以改善训练后量化
Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration 比特流损坏的 JPEG 图像是可恢复的：用于图像恢复的两阶段补偿和对齐框架
BKinD-3D: Self-Supervised 3D Keypoint Discovery From Multi-View Videos BKinD-3D：来自多视图视频的自监督 3D 关键点发现
Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation 通过多目标优化的黑盒稀疏对抗攻击
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning BlackVIP：用于稳健迁移学习的黑盒视觉提示
Blemish-Aware and Progressive Face Retouching With Limited Paired Data 使用有限的配对数据进行瑕疵感知和渐进式面部修饰
BlendFields: Few-Shot Example-Driven Facial Modeling BlendFields：Few-Shot 示例驱动的面部建模
Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective 通过视觉语言对应的盲图像质量评估：多任务学习视角
Blind Video Deflickering by Neural Filtering With a Flawed Atlas 使用有缺陷的图集进行神经过滤的盲视频去闪烁
Block Selection Method for Using Feature Norm in Out-of-Distribution Detection 在分布外检测中使用特征范数的块选择方法
Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images Blowing in the Wind：CycleNet for Human Cinemagraphs from Still Images
Blur Interpolation Transformer for Real-World Motion From Blur 模糊真实世界运动的模糊插值变换器
Boost Vision Transformer With GPU-Friendly Sparsity and Quantization Boost Vision Transformer 具有 GPU 友好的稀疏性和量化
Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation 通过自适应对抗蒸馏提高学生模型的准确性和鲁棒性
Boosting Detection in Crowd Analysis via Underutilized Output Features 通过未充分利用的输出特征促进人群分析中的检测
Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt 通过显着性提示的无监督预训练促进低数据实例分割
Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data 通过利用所有未标记的数据来促进半监督学习
Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization 通过基于边际的不确定性加权和概率正则化促进传导性小样本微调
Boosting Verified Training for Robust Image Classifications via Abstraction 通过抽象促进稳健图像分类的验证训练
Boosting Video Object Segmentation via Space-Time Correspondence Learning 通过时空对应学习促进视频对象分割
Boosting Weakly-Supervised Temporal Action Localization With Text Information 用文本信息促进弱监督时间动作定位
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery Bootstrap Your Own Prior：走向与分布无关的新类发现
Bootstrapping Objectness From Videos by Relaxed Common Fate and Visual Grouping 通过轻松的共同命运和视觉分组从视频中引导客观性
Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation 风格和失真都很重要：用于全景语义分割的双路径无监督域自适应
Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary 边界学习：通过转移决策边界快速遗忘深度网络
Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval 图像检索中通过对抗性学习的边界感知向后兼容表示
Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation 用于弱监督语义分割的边界增强联合训练
Box-Level Active Detection 盒级主动检测
BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation BoxTeacher：探索用于弱监督实例分割的高质量伪标签
Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack 突破 FedMD：通过 Paired-Logits 反转攻击恢复图像
Breaking the “Object” in Video Object Segmentation 打破视频对象分割中的“对象”
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection 桥接精度和置信度：校准目标检测的训练时间损失
Bridging Search Region Interaction With Template for RGB-T Tracking 桥接搜索区域与 RGB-T 跟踪模板的交互
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification 在部分注释的多标签分类中弥合模型解释之间的差距
Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild 将输入带到共享域以在野外进行 3D 交互手恢复
B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution 用于屏幕内容图像超分辨率的 B 样条纹理系数估计器
Building Rearticulable Models for Arbitrary 3D Objects From 4D Point Clouds 从 4D 点云为任意 3D 对象构建可重构模型
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects BundleSDF：未知对象的神经 6-DoF 跟踪和 3D 重建
BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image BUOL：一个自下而上的框架，具有占用感知提升，用于从单个图像重建全景 3D 场景
Burstormer: Burst Image Restoration and Enhancement Transformer Burstormer：突发图像恢复和增强变压器
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network With Large Input CABM：具有大输入的单图像超分辨率网络的内容感知位映射
CafeBoost: Causal Feature Boost To Eliminate Task-Induced Bias for Class Incremental Learning CafeBoost：消除类增量学习的任务诱导偏差的因果特征提升
Camouflaged Instance Segmentation via Explicit De-Camouflaging 通过显式去伪装伪装实例分割
Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction 具有特征分解和边缘重建的伪装目标检测
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis CAMS：用于类别级功能性手部对象操作合成的标准化操作空间
Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields 规范场：姿势规范化神经场的自监督学习
Can’t Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders 偷不来？继续偷！针对图像编码器的对比窃取攻击
CAP: Robust Point Cloud Classification via Semantic and Structural Modeling CAP：通过语义和结构建模的稳健点云分类
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? Cap4Video：辅助字幕可以为文本视频检索做什么？
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining CapDet：统一密集字幕和开放世界检测预训练
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection CAPE：用于多视图 3D 对象检测的相机视图位置嵌入
CaPriDe Learning: Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss CaPriDe Learning：基于加密友好蒸馏损失的机密私密去中心化学习
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer CAP-VSTNet：内容亲和力保留的多功能风格迁移
CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects CARTO：关节物体的类别和关节不可知重建
Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization 开放世界弱监督时间动作定位的级联证据学习
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution 用于任意尺度超分辨率的级联局部隐式变换器
CASP-Net: Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual Perspective CASP-Net：从视听一致性感知角度重新思考视频显着性预测
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference Castling-ViT：在 Vision Transformer 推理中通过切换到线性角度注意力来压缩自我注意力
CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection CAT：用于开放世界对象检测的定位和识别级联检测转换器
Catch Missing Details: Image Reconstruction With Frequency Augmented Variational Autoencoder 捕捉缺失的细节：使用频率增强变分自动编码器进行图像重建
Category Query Learning for Human-Object Interaction Classification 用于人-物交互分类的类别查询学习
Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction 用于整体生存时间预测的因果意识术中归因
CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes CCuantuMM：多种形状的循环一致量子混合匹配
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion CDDFuse：用于多模态图像融合的相关驱动双分支特征分解
CelebV-Text: A Large-Scale Facial Text-Video Dataset CelebV-Text：大规模面部文本视频数据集
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation 用于实时 LiDAR 全景分割的中心聚焦网络
CFA: Class-Wise Calibrated Fair Adversarial Training CFA：Class-Wise Calibrated Fair Adversarial Training
CF-Font: Content Fusion for Few-Shot Font Generation CF-Font：Few-Shot 字体生成的内容融合
Change-Aware Sampling and Contrastive Learning for Satellite Images 卫星图像的变化感知采样和对比学习
Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations Chat2Map：多自我对话的高效场景映射
Chi-Chong Wong Chi-Chong黄
CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning CHMATCH：对比分层匹配和稳健的自适应阈值提升半监督学习
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution CiaoSR：用于任意尺度图像超分辨率的连续隐式注意力网络
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning CiCo：通过跨语言对比学习进行域感知手语检索
CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection CIGAR：域自适应对象检测的跨模态图推理
CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions CIMI4D：人景交互下的大型多模态攀爬运动数据集
CIRCLE: Capture in Rich Contextual Environments CIRCLE：在丰富的上下文环境中捕获
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose CLAMP：连接语言和动物姿势的基于提示的对比学习
Class Adaptive Network Calibration 类自适应网络校准
Class Attention Transfer Based Knowledge Distillation 基于类注意力转移的知识蒸馏
Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning 联邦半监督学习的类平衡自适应伪标签
Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos 基于类原型的对比学习对多标签和细粒度教育视频进行分类
Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation 用于无源无监督域自适应的类关系嵌入式学习
Class-Balancing Diffusion Models 类平衡扩散模型
Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition 用于深度长尾识别的类条件清晰度感知最小化
Class-Incremental Exemplar Compression for Class-Incremental Learning 用于类增量学习的类增量示例压缩
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not CLIP for All Things 基于零样本草图的图像检索，是否细粒度
CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation CLIP 也是一种高效的分割器：一种用于弱监督语义分割的文本驱动方法
CLIP the Gap: A Single Domain Generalization Approach for Object Detection CLIP the Gap：一种用于对象检测的单域泛化方法
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data CLIP2：来自真实世界点云数据的对比语言-图像-点预训练
CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search CLIP2Protect：通过对抗性潜在搜索使用文本引导化妆保护面部隐私
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP CLIP2Scene：通过 CLIP 实现标签高效的 3D 场景理解
CLIPPING: Distilling CLIP-Based Models With a Student Base for Video-Language Retrieval CLIPPING：提取基于 CLIP 的模型，以学生为基础进行视频语言检索
CLIPPO: Image-and-Language Understanding From Pixels Only CLIPPO：仅从像素理解图像和语言
CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation CLIP-S4：语言引导的自监督语义分割
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language CLIP-Sculptor：从自然语言零样本生成高保真和多样的形状
CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition CloSET：使用显式模板分解在连续表面上模拟穿着衣服的人
CLOTH4D: A Dataset for Clothed Human Reconstruction CLOTH4D：穿衣人体重建数据集
Clothed Human Performance Capture With a Double-Layer Neural Radiance Fields 具有双层神经辐射场的穿着人体表现捕捉
Clothing-Change Feature Augmentation for Person Re-Identification 用于行人再识别的换衣特征增强
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World 云设备协同适应现实世界中不断变化的环境
Clover: Towards a Unified Video-Language Alignment and Fusion Model 三叶草：走向统一的视频语言对齐和融合模型
Coaching a Teachable Student 指导可教学生
CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning CODA-Prompt：持续分解的基于注意力的提示，用于免排练的持续学习
CodeTalker: Speech-Driven 3D Facial Animation With Discrete Motion Prior CodeTalker：具有离散运动先验的语音驱动 3D 面部动画
Collaboration Helps Camera Overtake LiDAR in 3D Detection 协作帮助相机在 3D 检测中超越 LiDAR
Collaborative Diffusion for Multi-Modal Face Generation and Editing 多模态人脸生成和编辑的协同扩散
Collaborative Noisy Label Cleaner: Learning Scene-Aware Trailers for Multi-Modal Highlight Detection in Movies 协作噪声标签清理器：学习场景感知预告片以进行电影中的多模态亮点检测
Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding 用于时空视频接地的协作静态和动态视觉语言流
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception 为弱监督视听事件感知收集跨模态存在-不存在证据
Color Backdoor: A Robust Poisoning Attack in Color Space 颜色后门：颜色空间中的强大中毒攻击
Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation 结合隐式-显式视图相关性进行光场语义分割
CoMFormer: Continual Learning in Semantic and Panoptic Segmentation CoMFormer：语义和全景分割的持续学习
Command-Driven Articulated Object Understanding and Manipulation 命令驱动的铰接对象理解和操作
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories 3D常见宠物：现实生活中可变形类别的动态新视角合成
Compacting Binary Neural Networks by Sparse Kernel Selection 通过稀疏核选择压缩二进制神经网络
Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoor Scene Relighting 用于室外场景重新照明的神经辐射场和 CNN 的互补内在函数
Complete 3D Human Reconstruction From a Single Incomplete Image 从单个不完整图像完成 3D 人体重建
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning 自监督点云序列表示学习的完整到部分 4D 蒸馏
CompletionFormer: Depth Completion With Convolutions and Vision Transformers CompletionFormer：使用卷积和视觉转换器的深度补全
Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression 用于高效深度视频压缩的复杂性引导可精简解码器
Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation 合成器：用于稳健部件和对象分割的自下而上聚类和合成
Comprehensive and Delicate: An Efficient Transformer for Image Restoration 全面而精致：一种高效的图像修复变压器
Compressing Volumetric Radiance Fields to 1 MB 将体积辐射场压缩到 1 MB
Compression-Aware Video Super-Resolution 压缩感知视频超分辨率
Computational Flash Photography Through Intrinsics 通过内在计算的闪光摄影
Computationally Budgeted Continual Learning: What Does Matter? 计算预算的持续学习：重要的是什么？
Conditional Generation of Audio From Video via Foley Analogies 通过拟音类比从视频中有条件地生成音频
Conditional Image-to-Video Generation With Latent Flow Diffusion Models 使用潜流扩散模型的条件图像到视频生成
Conditional Text Image Generation With Diffusion Models 使用扩散模型生成条件文本图像
Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization 通过变分期望最大化的置信度感知个性化联合学习
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation 半监督语义分割的基于冲突的交叉视图一致性
Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching 共轭乘积图以实现全局最优 2D-3D 形状匹配
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries 连接点：使用两级查询的平面图重建
Connecting Vision and Language With Video Localized Narratives 将视觉和语言与视频本地化叙述联系起来
ConQueR: Query Contrast Voxel-DETR for 3D Object Detection ConQueR：用于 3D 对象检测的查询对比度体素-DETR
Consistent Direct Time-of-Flight Video Depth Super-Resolution 一致的直接飞行时间视频深度超分辨率
Consistent View Synthesis With Pose-Guided Diffusion Models 使用姿势引导扩散模型的一致视图合成
Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection Consistent-Teacher：减少半监督目标检测中不一致的伪目标
Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking 用于单目内窥镜跟踪的约束进化扩散滤波器
Constructing Deep Spiking Neural Networks From Artificial Neural Networks With Knowledge Distillation 通过知识蒸馏从人工神经网络构建深度尖峰神经网络
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning ConStruct-VL：无数据连续结构化 VL 概念学习
Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers 使用 Vision Transformers 进行高效语义分割的内容感知令牌共享
Context De-Confounded Emotion Recognition 上下文去混淆情绪识别
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training 用于 3D 语言预训练的上下文感知对齐和相互掩蔽
Context-Aware Pretraining for Efficient Blind Image Decomposition 用于高效盲图像分解的上下文感知预训练
Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation 用于统一视频实例和全景分割的上下文感知相关对象查询
Context-Based Trit-Plane Coding for Progressive Image Compression 用于渐进式图像压缩的基于上下文的三平面编码
Continual Detection Transformer for Incremental Object Detection 用于增量对象检测的连续检测转换器
Continual Semantic Segmentation With Automatic Memory Sample Selection 具有自动记忆样本选择的连续语义分割
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation 用于基于关键帧的运动插值的隐式运动流形的连续中间令牌学习
Continuous Landmark Detection With 3D Queries 使用 3D 查询进行连续地标检测
Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations 具有隐式神经表示的连续伪标签整流域自适应语义分割
Continuous Sign Language Recognition With Correlation Network 关联网络的连续手语识别
ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning ContraNeRF：通过对比学习从合成到真实的新视图合成的可推广神经辐射场
Contrastive Grouping With Transformer for Referring Image Segmentation 用于参考图像分割的 Transformer 对比分组
Contrastive Mean Teacher for Domain Adaptive Object Detectors 域自适应对象检测器的对比均值教师
Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank 通过 Reliable Bank 进行水下图像恢复的对比半监督学习
Controllable Light Diffusion for Portraits 肖像的可控光扩散
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models 通过稀疏潜在点扩散模型生成可控网格
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders ConvNeXt V2：使用 Masked Autoencoders 共同设计和缩放 ConvNet
ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing ConZIC：通过基于采样的抛光实现可控零样本图像说明
Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets 合作或竞争：通过自适应预算避免玩家控制多目标鲁棒性
CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching CORA：将 CLIP 用于具有区域提示和锚点预匹配的开放词汇检测
CoralStyleCLIP: Co-Optimized Region and Layer Selection for Image Editing CoralStyleCLIP：图像编辑的共同优化区域和图层选择
Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning 用于细粒度自监督学习的开放集核心集采样
Correlational Image Modeling for Self-Supervised Visual Pre-Training 自监督视觉预训练的相关图像建模
Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution 具有非对称特征学习和匹配流超分辨率的对应变换器
Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking 具有不确定性感知组交换掩码的共同显着目标检测
Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM Co-SLAM：神经实时 SLAM 的联合坐标和稀疏参数编码
Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards 通过对比预训练奖励强化学习合成语音手势
COT: Unsupervised Domain Adaptation With Clustering and Optimal Transport COT：具有聚类和最优传输的无监督域适应
Co-Training 2L Submodels for Visual Recognition 为视觉识别共同训练 2L 子模型
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation 牧场上的奶牛：语言驱动的零样本对象导航的基线和基准
CP3: Channel Pruning Plug-In for Point-Based Networks CP3：基于点的网络的通道修剪插件
CRAFT: Concept Recursive Activation FacTorization for Explainability CRAFT：可解释性的概念递归激活因子化
CREPE: Can Vision-Language Foundation Models Reason Compositionally? CREPE：视觉语言基础模型能否进行组合推理？
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability CR-FIQA：通过学习样本相对分类能力评估人脸图像质量
Critical Learning Periods for Multisensory Integration in Deep Networks 深度网络中多感官整合的关键学习期
CrOC: Cross-View Online Clustering for Dense Visual Representation Learning CrOC：用于密集视觉表示学习的跨视图在线聚类
Cross-Domain 3D Hand Pose Estimation With Dual Modalities 双模态跨域 3D 手势估计
Cross-Domain Image Captioning With Discriminative Finetuning 具有判别微调的跨域图像字幕
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models 跨 GAN 审计：无监督识别预训练生成模型之间的属性级异同
Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis 用于高分辨率新视图合成的多视图图像超分辨率辐射场的交叉引导优化
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning 深度度量学习中条件嵌入的跨图像注意
Crossing the Gap: Domain Generalization for Image Captioning 跨越鸿沟：图像字幕的域泛化
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval 用于文本到图像人物检索的跨模态隐式关系推理和对齐
Crowd3D: Towards Hundreds of People Reconstruction From a Single Image Crowd3D：从单个图像重建数百人
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model CrowdCLIP：通过视觉语言模型进行无监督人群计数
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation C-SFDA：一种用于高效源自由域自适应的课程学习辅助自训练框架
CUDA: Convolution-Based Unlearnable Datasets CUDA：基于卷积的不可学习数据集
CUF: Continuous Upsampling Filters CUF：连续上采样滤波器
Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing 物理感知单图像去雾的课程对比正则化
Curricular Object Manipulation in LiDAR-Based Object Detection 基于 LiDAR 的目标检测中的课程目标操作
Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification 用于长尾分类的曲率平衡特征流形学习
Cut and Learn for Unsupervised Object Detection and Instance Segmentation 用于无监督对象检测和实例分割的剪切和学习
CutMIB: Boosting Light Field Super-Resolution via Multi-View Image Blending CutMIB：通过多视图图像混合提升光场超分辨率
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment CVT-SLR：用于具有变分对齐的手语识别的对比视觉文本转换
CXTrack: Improving 3D Point Cloud Tracking With Contextual Information CXTrack：使用上下文信息改进 3D 点云跟踪
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers D2Former：通过基于代理的转换器联合学习分层检测器和上下文描述符
DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization DA Wand：使用神经网格参数化的失真感知选择
DAA: A Delta Age AdaIN Operation for Age Estimation via Binary Code Transformer DAA：通过二进制代码转换器进行年龄估计的 Delta Age AdaIN 操作
DA-DETR: Domain Adaptive Detection Transformer With Information Fusion DA-DETR：具有信息融合的领域自适应检测变压器
DaFKD: Domain-Aware Federated Knowledge Distillation DaFKD：域感知联合知识蒸馏
DARE-GRAM: Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices DARE-GRAM：通过对齐逆 Gram 矩阵进行无监督域适应回归
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks DART：多样化聚合重复训练提高了神经网络的泛化能力
DartBlur: Privacy Preservation With Detection Artifact Suppression DartBlur：通过检测伪影抑制实现隐私保护
Data-Driven Feature Tracking for Event Cameras 事件相机的数据驱动特征跟踪
Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision 具有分级相似性监督的数据高效的大规模地点识别
Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint 通过特征交换和激活区域约束的无数据知识蒸馏
Data-Free Sketch-Based Image Retrieval 基于草图的无数据图像检索
DATE: Domain Adaptive Product Seeker for E-Commerce 日期：电子商务领域自适应产品搜索者
DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model DATID-3D：使用文本到图像扩散的 3D 生成模型的多样性保留域自适应
DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields DBARF：深度束调整泛化神经辐射场
DC2: Dual-Camera Defocus Control by Learning To Refocus DC2：通过学习重新对焦来控制双摄像头散焦
DCFace: Synthetic Face Generation With Dual Condition Diffusion Model DCFace：具有双重条件扩散模型的合成人脸生成
Dealing With Cross-Task Class Discrimination in Online Continual Learning 处理在线持续学习中的跨任务类别歧视
DeAR: Debiasing Vision-Language Models With Additive Residuals DeAR：使用附加残差对视觉语言模型进行去偏
Decentralized Learning With Multi-Headed Distillation 多头蒸馏的分散式学习
DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fine Contrastive Ranking DeCo：通过从粗到细的对比排序对组合时间基础进行分解和重构
Decompose More and Aggregate Better: Two Closer Looks at Frequency Representation Learning for Human Motion Prediction 分解更多并更好地聚合：两次仔细观察人体运动预测的频率表示学习
Decomposed Cross-Modal Distillation for RGB-Based Temporal Action Detection 用于基于 RGB 的时间动作检测的分解交叉模态蒸馏
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning 用于组合零样本学习的分解软提示引导融合增强
Decoupled Multimodal Distilling for Emotion Recognition 用于情感识别的解耦多模态蒸馏
Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains 解耦的语义原型能够从专家驱动领域中的半弱分割的不同注释类型中学习
Decoupling Human and Camera Motion From Videos in the Wild 从野外视频中解耦人类和相机运动
Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning 解耦学习和记忆：用于任务增量学习的具有知识投射的双层记忆框架
Decoupling MaxLogit for Out-of-Distribution Detection 解耦 MaxLogit 以进行分布外检测
Decoupling-and-Aggregating for Image Exposure Correction 用于图像曝光校正的解耦和聚合
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit 通过尺度等方差追踪的深度任意尺度图像超分辨率
Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model 深度曲线编辑：预训练深度生成模型的可交换和非线性图像处理
Deep Depth Estimation From Thermal Image 热图像的深度估计
Deep Deterministic Uncertainty: A New Simple Baseline 深度确定性不确定性：一个新的简单基线
Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring 用于高效视频去模糊的深度判别空间和时间网络
Deep Dive Into Gradients: Better Optimization for 3D Object Detection With Gradient-Corrected IoU Supervision 深入研究梯度：使用梯度校正的 IoU 监督更好地优化 3D 对象检测
Deep Factorized Metric Learning 深度分解度量学习
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric 通过最大化和最小化互信息的深度公平聚类：理论、算法和度量
Deep Frequency Filtering for Domain Generalization 域泛化的深度频率过滤
Deep Graph Reprogramming 深度图重编程
Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration 稳健的非刚性点云配准的基于深度图的空间一致性
Deep Hashing With Minimal-Distance-Separated Hash Centers 使用最小距离分隔的哈希中心进行深度哈希
Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototype Alignment 具有跨视图部分样本和原型对齐的深度不完整多视图聚类
Deep Learning of Partial Graph Matching via Differentiable Top-K 基于可微 Top-K 的部分图匹配深度学习
Deep Polarization Reconstruction With PDAVIS Events 使用 PDAVIS 事件进行深度偏振重建
Deep Random Projector: Accelerated Deep Image Prior 深度随机投影仪：加速深度图像先验
Deep Semi-Supervised Metric Learning With Mixed Label Propagation 混合标签传播的深度半监督度量学习
Deep Stereo Video Inpainting 深度立体视频修复
DeepLSD: Line Segment Detection and Refinement With Deep Image Gradients DeepLSD：使用深度图像梯度进行线段检测和细化
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network DeepMAD：深度卷积神经网络的数学架构设计
DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization DeepMapping2：自监督大规模激光雷达地图优化
DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting DeepSolo：让具有显式点的 Transformer 解码器单独用于文本识别
DeepVecFont-v2: Exploiting Transformers To Synthesize Vector Fonts With Higher Quality DeepVecFont-v2：利用转换器合成更高质量的矢量字体
DeFeeNet: Consecutive 3D Human Motion Prediction With Deviation Feedback DeFeeNet：具有偏差反馈的连续 3D 人体运动预测
Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning 防御基于补丁的自监督学习后门攻击
Defining and Quantifying the Emergence of Sparse Concepts in DNNs 定义和量化 DNN 中稀疏概念的出现
Deformable Mesh Transformer for 3D Human Mesh Recovery 用于 3D 人体网格恢复的可变形网格转换器
DegAE: A New Pretraining Paradigm for Low-Level Vision DegAE：一种新的低级视觉预训练范例
DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting DeGPR：用于多类细胞检测和计数的深度引导后验正则化
DejaVu: Conditional Regenerative Learning To Enhance Dense Prediction DejaVu：增强密集预测的条件再生学习
Delivering Arbitrary-Modal Semantic Segmentation 提供任意模态语义分割
DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation DeltaEdit：探索文本驱动图像处理的无文本训练
Delving Into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling 深入研究 SO(3) 流形上的离散归一化流以进行概率旋转建模
Delving Into Shape-Aware Zero-Shot Semantic Segmentation 深入研究形状感知零样本语义分割
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint 用于图像编辑的 Delving StyleGAN 反演：基础潜在空间观点
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression 通过对抗性工具变量回归揭秘对抗性例子的因果特征和稳健网络的因果接种
Dense Distinct Query for End-to-End Object Detection 用于端到端对象检测的密集不同查询
Dense Network Expansion for Class Incremental Learning 用于课堂增量学习的密集网络扩展
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline 未修剪视频中的密集本地化视听事件：大规模基准和基线
Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection 3D 对象检测的密度不敏感无监督域自适应
DepGraph: Towards Any Structural Pruning DepGraph：走向任何结构修剪
Depth Estimation From Camera Image and mmWave Radar Point Cloud 相机图像和毫米波雷达点云的深度估计
Depth Estimation From Indoor Panoramas With Neural Scene Representation 具有神经场景表示的室内全景图的深度估计
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection DeSTSeg：用于异常检测的分段引导降噪师生
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment DetCLIPv2：通过词区域对齐进行可扩展的开放式词汇对象检测预训练
Detecting and Grounding Multi-Modal Media Manipulation 检测和接地多模态媒体操纵
Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency 基于腐败鲁棒性一致性的推理阶段后门检测
Detecting Backdoors in Pre-Trained Encoders 检测预训练编码器中的后门
Detecting Everything in the Open World: Towards Universal Object Detection 检测开放世界中的一切：迈向通用对象检测
Detecting Human-Object Contact in Images 检测图像中的人与物体接触
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding 检测中心：通过语言嵌入的查询自适应统一对象检测数据集
Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns 使用二进制神经元激活模式检测分布外样本
DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object Detection 具有额外全局聚合的 DETR 用于跨域弱监督对象检测
DETRs With Hybrid Matching 具有混合匹配的 DETR
Devil Is in the Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization Devil Is in the Question：用于真实世界医学图像分割和分布外定位的改进掩膜变换器
Devil’s on the Edges: Selective Quad Attention for Scene Graph Generation Devil’s on the Edges：场景图生成的选择性四重注意力
DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects DexArt：对铰接对象进行通用灵巧操作的基准测试
DF-Platter: Multi-Face Heterogeneous Deepfake Dataset DF-Platter：多面异构 Deepfake 数据集
DiffCollage: Parallel Generation of Large Content With Diffusion Models DiffCollage：使用扩散模型并行生成大内容
Differentiable Architecture Search With Random Features 具有随机特征的可区分架构搜索
Differentiable Shadow Mapping for Efficient Inverse Graphics 高效逆向图形的可微分阴影映射
Difficulty-Based Sampling for Debiased Contrastive Representation Learning 用于去偏对比表示学习的基于难度的抽样
DiffPose: Toward More Reliable 3D Pose Estimation DiffPose：实现更可靠的 3D 姿态估计
DiffRF: Rendering-Guided 3D Radiance Field Diffusion DiffRF：渲染引导的 3D 辐射场扩散
DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion DiffSwap：通过 3D-Aware Masked Diffusion 进行高保真和可控的面部交换
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation DiffTalk：为广义音频驱动的肖像动画制作扩散模型
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models 扩散艺术还是数字伪造？研究扩散模型中的数据复制
Diffusion Probabilistic Model Made Slim 扩散概率模型变得苗条
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding 扩散视频自动编码器：通过分离视频编码实现时间一致的人脸视频编辑
Diffusion-Based Signed Distance Fields for 3D Shape Generation 用于 3D 形状生成的基于扩散的有符号距离场
DiffusioNeRF: Regularizing Neural Radiance Fields With Denoising Diffusion Models DiffusioNeRF：使用去噪扩散模型对神经辐射场进行正则化
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing DiffusionRig：学习面部外观编辑的个性化先验
Diffusion-SDF: Text-To-Shape via Voxelized Diffusion Diffusion-SDF：通过体素化扩散实现文本到形状
DIFu: Depth-Guided Implicit Function for Clothed Human Reconstruction DIFu：用于穿衣人体重建的深度引导隐函数
DiGA: Distil To Generalize and Then Adapt for Domain Adaptive Semantic Segmentation DiGA：提炼泛化然后适应域自适应语义分割
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection DiGeo：用于广义少镜头目标检测的判别式几何感知学习
Dimensionality-Varying Diffusion Process 变维扩散过程
Dimitrios Kollias 迪米特里奥斯科利亚斯
DINER: Depth-Aware Image-Based NEural Radiance Fields DINER：基于深度感知图像的神经辐射场
DINER: Disorder-Invariant Implicit Neural Representation DINER：无序不变的隐式神经表征
DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling DINN360：用于纬度感知 360 度图像重新缩放的可变形可逆神经网络
Dionysus: Recovering Scene Structures by Dividing Into Semantic Pieces Dionysus：通过划分成语义片段来恢复场景结构
DIP: Dual Incongruity Perceiving Network for Sarcasm Detection DIP：用于讽刺检测的双重不协调感知网络
Directional Connectivity-Based Segmentation of Medical Images 基于定向连通性的医学图像分割
DISC: Learning From Noisy Labels via Dynamic Instance-Specific Selection and Correction DISC：通过动态实例特定选择和校正从噪声标签中学习
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training DisCo-CLIP：记忆高效 CLIP 训练的分布式对比损失
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis DisCoScene：用于可控 3D 感知场景合成的空间分离生成辐射场
Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering 发现真实关联：视频问答中的多模态因果推理
Discrete Point-Wise Attack Is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition 离散逐点攻击是不够的：用于人脸识别的广义流形对抗攻击
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder 通过结构增强的递归变分自编码器区分已知对象和未知对象
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection 用于共同显着性目标检测的判别性共同显着性和背景挖掘转换器
Discriminator-Cooperated Feature Map Distillation for GAN Compression 用于 GAN 压缩的鉴别器协同特征图蒸馏
Disentangled Representation Learning for Unsupervised Neural Quantization 无监督神经量化的分离表示学习
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With Cross-Scale Distortion Awareness 具有跨尺度失真意识的室内全景房间布局估计的解耦正交平面
Disentangling Writer and Character Styles for Handwriting Generation 为手写生成分离作家和字符样式
Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition 为连续手语识别提取跨时态上下文
Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection 从 3D 对象检测的不完美专家中提取焦点知识
Distilling Neural Fields for Real-Time Articulated Shape Reconstruction 为实时铰接形状重建提取神经场
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation 为弱监督的小样本分类和分割提取自监督视觉变换器
Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization 提取视觉语言预训练以与弱监督时间动作本地化协作
DistilPose: Tokenized Pose Regression With Heatmap Distillation DistilPose：使用热图蒸馏进行标记化姿势回归
DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling DistractFlow：通过真实干扰和伪标记改进光流估计
Distribution Shift Inversion for Out-of-Distribution Prediction 分布外预测的分布偏移反演
DisWOT: Student Architecture Search for Distillation WithOut Training DisWOT：无需培训即可进行蒸馏的学生架构搜索
DivClust: Controlling Diversity in Deep Clustering DivClust：控制深度聚类中的多样性
Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral Hand Disentanglement 通过双边手解缠从身体动力学预测不同的 3D 手势
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification 用于可见红外行人重新识别的多样化嵌入扩展网络和低光交叉模态基准
Diversity-Aware Meta Visual Prompting 多样性感知元视觉提示
Diversity-Measurable Anomaly Detection 多样性可测量异常检测
Divide and Adapt: Active Domain Adaptation via Customized Learning 划分和适应：通过定制学习进行主动领域适应
Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning 分而治之：用对象分解和组合推理回答问题
DKM: Dense Kernelized Feature Matching for Geometry Estimation DKM：用于几何估计的密集核化特征匹配
DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning DKT：用于课堂增量学习的多样化知识转移转换器
DLBD: A Self-Supervised Direct-Learned Binary Descriptor DLBD：一种自监督直接学习二进制描述符
DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos DNeRV：通过视频的差异神经表示建模固有动力学
DNF: Decouple and Feedback Network for Seeing in the Dark DNF：用于在黑暗中看见的解耦和反馈网络
Document Image Shadow Removal Guided by Color-Aware Background 以颜色感知背景为指导的文档图像阴影去除
Domain Expansion of Image Generators 图像生成器的域扩展
Domain Generalized Stereo Matching via Hierarchical Visual Transformation 通过层次视觉转换的领域广义立体匹配
DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation DoNet：用于细胞学实例分割的深度去重叠网络
Don’t Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis 别骗我！通过验证的扰动分析实现稳健且高效的可解释性
Doubly Right Object Recognition: A Why Prompt for Visual Rationales 双正确对象识别：为什么提示视觉基本原理
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing DPE：解开一般视频人像编辑的姿势和表情
DPF: Learning Dense Prediction Fields With Weak Supervision DPF：在弱监督下学习密集预测领域
DP-NeRF: Deblurred Neural Radiance Field With Physical Scene Priors DP-NeRF：具有物理场景先验的去模糊神经辐射场
DR2: Diffusion-Based Robust Degradation Remover for Blind Face Restoration DR2：用于盲人面部修复的基于扩散的稳健降解去除剂
DrapeNet: Garment Generation and Self-Supervised Draping DrapeNet：服装生成和自我监督的立体裁剪
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models Dream3D：使用 3D 形状先验和文本到图像扩散模型的零样本文本到 3D 合成
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation DreamBooth：为主题驱动生成微调文本到图像扩散模型
DropKey for Vision Transformer Vision Transformer 的 DropKey
DropMAE: Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks DropMAE：用于跟踪任务的具有空间注意丢失的掩码自动编码器
DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment DSFNet：用于遮挡稳健 3D 密集面部对齐的双空间融合网络
DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets DSVT：带旋转集的动态稀疏体素变换器
Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval 用于视频文本检索的双对齐无监督域自适应
Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation 具有对抗性噪声生成的双桥接域自适应 rPPG 估计
Dual-Path Adaptation From Image to Video Transformers 从图像到视频转换器的双路径自适应
DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium DualRefine：通过迭代对极采样和细化达到平衡的自监督深度和姿态估计
DualRel: Semi-Supervised Mitochondria Segmentation From a Prototype Perspective DualRel：从原型角度看的半监督线粒体分割
DualVector: Unsupervised Vector Font Synthesis With Dual-Part Representation DualVector：具有双部分表示的无监督矢量字体合成
DyLiN: Making Light Field Networks Dynamic DyLiN：使光场网络动态化
DynaFed: Tackling Client Data Heterogeneity With Global Dynamics DynaFed：使用全局动态处理客户端数据异构性
DynaMask: Dynamic Mask Selection for Instance Segmentation DynaMask：用于实例分割的动态掩码选择
Dynamic Aggregated Network for Gait Recognition 用于步态识别的动态聚合网络
Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection 用于定向微小物体检测的动态粗到精学习
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery 广义类别发现的动态概念对比学习
Dynamic Focus-Aware Positional Queries for Semantic Segmentation 用于语义分割的动态焦点感知位置查询
Dynamic Generative Targeted Attacks With Pattern Injection 具有模式注入的动态生成目标攻击
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation 胸部 X 光报告生成的动态图增强对比学习
Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection 基于内容引导的空间频率关系推理的动态图学习用于 Deepfake 检测
Dynamic Inference With Grounding Based Vision and Language Models 基于接地的视觉和语言模型的动态推理
Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies 用于跨不同网络拓扑搜索的多任务学习的动态神经网络
Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation 动态实例引导自适应：一种用于测试时域自适应语义分割的无后向方法
DynamicDet: A Unified Dynamic Architecture for Object Detection DynamicDet：用于对象检测的统一动态架构
DynamicStereo: Consistent Dynamic Depth From Stereo Videos DynamicStereo：立体视频的一致动态深度
DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata DyNCA：使用神经元胞自动机的实时动态纹理合成
DynIBaR: Neural Dynamic Image-Based Rendering DynIBaR：基于神经动态图像的渲染
E2PN: Efficient SE(3)-Equivariant Point Network E2PN：高效SE(3)-等变点网络
EC2: Emergent Communication for Embodied Control EC2：用于具体控制的紧急通信
ECON: Explicit Clothed Humans Optimized via Normal Integration ECON：通过正常集成优化的显式穿衣人
EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization EcoTTA：通过自提正则化实现内存高效的连续测试时间自适应
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding EDA：用于 3D 视觉接地的显式文本解耦和密集对齐
EDGE: Editable Dance Generation From Music EDGE：音乐中的可编辑舞蹈生成
Edge-Aware Regional Message Passing Controller for Image Forgery Localization 用于图像伪造本地化的边缘感知区域消息传递控制器
Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision 边缘到形状再到概念：增强视觉的对抗性增强
EDICT: Exact Diffusion Inversion via Coupled Transformations EDICT：通过耦合变换进行精确扩散反演
EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points EditableNeRF：按关键点编辑拓扑变化的神经辐射场
EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision EFEM：无场景监督的 3D 对象分割的等变神经场期望最大化
Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes Through Fully Connected Layer Substitution 通过全连接层替换对基于护照的DNN知识产权保护方案进行有效的歧义攻击
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration 用于图像恢复的图像层次结构的高效和显式建模
Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring 用于高质量图像去模糊的高效基于频域的变压器
Efficient Hierarchical Entropy Model for Learned Point Cloud Compression 用于学习点云压缩的高效分层熵模型
Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks 通过最小化浮点错误对基于梯度的攻击的不利影响的高效损失函数
Efficient Map Sparsification Based on 2D and 3D Discretized Grids 基于2D和3D离散化网格的高效地图稀疏化
Efficient Mask Correction for Click-Based Interactive Image Segmentation 基于点击的交互式图像分割的高效蒙版校正
Efficient Movie Scene Detection Using State-Space Transformers 使用状态空间变换器进行高效的电影场景检测
Efficient Multimodal Fusion via Interactive Prompting 通过交互式提示进行高效的多模式融合
Efficient On-Device Training via Gradient Filtering 通过梯度过滤进行高效的设备端训练
Efficient RGB-T Tracking via Cross-Modality Distillation 通过跨模态蒸馏实现高效的 RGB-T 跟踪
Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition 通过块 Krylov 迭代和 CUR 分解进行高效稳健的主成分分析
Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis 具有列-行纠缠像素合成的高效尺度不变生成器
Efficient Second-Order Plane Adjustment 高效的二阶平面调整
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos 通过改变压缩视频的分辨率进行高效语义分割
Efficient Verification of Neural Networks Against LVM-Based Specifications 根据基于 LVM 的规范对神经网络进行有效验证
Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations 使用多平面特征表示的高效视图合成和基于 3D 的多帧去噪
EfficientSCI: Densely Connected Network With Space-Time Factorization for Large-Scale Video Snapshot Compressive Imaging EfficientSCI：用于大规模视频快照压缩成像的时空分解密集连接网络
EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention EfficientViT：具有级联组注意力的内存高效视觉转换器
Ego-Body Pose Estimation via Ego-Head Pose Estimation 通过自我头部姿势估计进行自我身体姿势估计
Egocentric Audio-Visual Object Localization 以自我为中心的视听对象定位
Egocentric Auditory Attention Localization in Conversations 对话中的自我中心听觉注意力定位
Egocentric Video Task Translation 以自我为中心的视频任务翻译
Elastic Aggregation for Federated Optimization 用于联合优化的弹性聚合
EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets EMT-NAS：在来自不同数据集的任务之间传输架构知识
Endpoints Weight Fusion for Class Incremental Semantic Segmentation 类增量语义分割的端点权重融合
End-to-End 3D Dense Captioning With Vote2Cap-DETR 使用 Vote2Cap-DETR 的端到端 3D 密集字幕
End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curve 使用分段贝塞尔曲线的端到端矢量化高精地图构建
End-to-End Video Matting With Trimap Propagation 使用 Trimap 传播的端到端视频抠图
Energy-Efficient Adaptive 3D Sensing 高能效自适应 3D 传感
Enhanced Multimodal Representation Learning With Cross-Modal KD 使用跨模态 KD 增强多模态表示学习
Enhanced Stable View Synthesis 增强的稳定视图合成
Enhanced Training of Query-Based Object Detection via Selective Query Recollection 通过选择性查询回收增强基于查询的对象检测训练
Enhancing Deformable Local Features by Jointly Learning To Detect and Describe Keypoints 通过联合学习检测和描述关键点来增强可变形局部特征
Enhancing Multiple Reliability Measures via Nuisance-Extended Information Bottleneck 通过滋扰扩展信息瓶颈增强多种可靠性措施
Enhancing the Self-Universality for Transferable Targeted Attacks 增强可转移目标攻击的自我普遍性
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition 扩大开放集动作识别的特定于实例和特定于类的信息
Ensemble-Based Blackbox Attacks on Dense Prediction 基于集成的黑盒攻击密集预测
EqMotion: Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning EqMotion：具有不变交互推理的等变多智能体运动预测
Equiangular Basis Vectors 等角基向量
Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution 移动图像超分辨率的等效变换与双流网络构建
ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer ERM-KTP：通过知识转移进行知识级机器学习
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts ERNIE-ViLG 2.0：使用知识增强的去噪专家组合改进文本到图像的扩散模型
ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields ESLAM：基于符号距离场混合表示的高效密集 SLAM 系统
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale EVA：探索大规模蒙面视觉表征学习的局限性
Evading DeepFake Detectors via Adversarial Statistical Consistency 通过对抗性统计一致性规避 DeepFake 检测器
Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces 使用属性条件对抗面孔逃避法医分类器
EVAL: Explainable Video Anomaly Localization EVAL：可解释的视频异常定位
Event-Based Blurry Frame Interpolation Under Blind Exposure 盲曝光下基于事件的模糊帧插值
Event-Based Frame Interpolation With Ad-Hoc Deblurring 具有 Ad-Hoc 去模糊的基于事件的帧插值
Event-Based Shape From Polarization 来自偏振的基于事件的形状
Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields 具有跨模态非对称双向运动场的基于事件的视频帧插值
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning 通过稀疏-密集互补学习的事件引导行人再识别
EventNeRF: Neural Radiance Fields From a Single Colour Event Camera EventNeRF：来自单色事件相机的神经辐射场
Evolved Part Masking for Self-Supervised Learning 用于自监督学习的进化部分掩蔽
EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction EvShutter：转换事件以进行无约束卷帘快门校正
Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields Exact-NeRF：神经辐射场精确体积参数化的探索
EXCALIBUR: Encouraging and Evaluating Embodied Exploration EXCALIBUR：鼓励和评估具身探索
Executing Your Commands via Motion Diffusion in Latent Space 通过潜在空间中的运动扩散执行命令
Exemplar-FreeSOLO: Enhancing Unsupervised Instance Segmentation With Exemplars Exemplar-FreeSOLO：使用示例增强无监督实例分割
EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata EXIF 作为语言：学习图像和相机元数据之间的跨模态关联
Explaining Image Classifiers With Multiscale Directional Image Representation 用多尺度方向图像表示解释图像分类器
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection 用于监督异常检测的显式边界引导半推拉对比学习
Explicit Visual Prompting for Low-Level Structure Segmentations 低级结构分割的显式视觉提示
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection 利用伪标签的完整性和不确定性进行弱监督视频异常检测
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR 利用未标记的照片获得更强的细粒度 SBIR
Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification 探索和利用不完整多视图分类的不确定性
Exploring and Utilizing Pattern Imbalance 探索和利用模式失衡
Exploring Data Geometry for Continual Learning 探索数据几何以持续学习
Exploring Discontinuity for Video Frame Interpolation 探索视频帧插值的不连续性
Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation 探索小样本图像生成中的不兼容知识转移
Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis 使用可学习的聚类提示探索类内变异因子以进行半监督图像合成
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation 探索高质量视频帧插值的运动模糊和对齐
Exploring Structured Semantic Prior for Multi Label Recognition With Incomplete Labels 探索具有不完整标签的多标签识别的结构化语义先验
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language 探索原语对视觉和语言中的组合泛化的影响
Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization 探索架构设计与对抗性鲁棒泛化之间的关系
expOSE: Accurate Initialization-Free Projective Factorization Using Exponential Regularization expOSE：使用指数正则化的精确无初始化投影分解
Extracting Class Activation Maps From Non-Discriminative Features As Well 也从非判别特征中提取类激活图
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation 通过帧间注意提取运动和外观以进行高效的视频帧插值
F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories F2-NeRF：使用自由相机轨迹进行快速神经辐射场训练
FAC: 3D Representation Learning via Foreground Aware Feature Contrast FAC：通过前景感知特征对比进行 3D 表征学习
FaceLit: Neural 3D Relightable Faces FaceLit：神经 3D Relightable Faces
Fair Federated Medical Image Segmentation via Client Contribution Estimation 通过客户贡献估计进行公平联合医学图像分割
Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training 公平刮票：在没有重量训练的情况下寻找公平的稀疏网络
Fake It Till You Make It: Learning Transferable Representations From Synthetic ImageNet Clones 假装直到成功：从合成 ImageNet 克隆中学习可迁移表示
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks FAME-ViL：用于异构时尚任务的多任务视觉语言模型
Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts Fantastic Breaks：真实世界破碎物体及其完整对应物的成对 3D 扫描数据集
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training FashionSAP：细粒度时尚视觉-语言预训练的符号和属性提示
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation 具有无偏上下文增强的快速上下文场景图生成
Fast Monocular Scene Reconstruction With Global-Sparse Local-Dense Grids 使用全局稀疏局部密集网格的快速单目场景重建
Fast Point Cloud Generation With Straight Flows 使用直线流快速生成点云
FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation FastInst：一个简单的基于查询的实时实例分割模型
FCC: Feature Clusters Compression for Long-Tailed Visual Recognition FCC：用于长尾视觉识别的特征簇压缩
FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER FeatER：通过基于特征图的 TransformER 进行人体重建的高效网络
Feature Aggregated Queries for Transformer-Based Video Object Detectors 基于 Transformer 的视频对象检测器的特征聚合查询
Feature Alignment and Uniformity for Test Time Adaptation 测试时间适应的特征对齐和均匀性
Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition 用于微表情识别的具有自适应位移生成和 Transformer 融合的特征表示学习
Feature Separation and Recalibration for Adversarial Robustness 对抗鲁棒性的特征分离和重新校准
Feature Shrinkage Pyramid for Camouflaged Object Detection With Transformers 用于使用 Transformer 进行伪装物体检测的特征收缩金字塔
FeatureBooster: Boosting Feature Descriptors With a Lightweight Neural Network FeatureBooster：使用轻量级神经网络提升特征描述符
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning FedDM：高效通信联邦学习的迭代分布匹配
Federated Domain Generalization With Generalization Adjustment 具有泛化调整的联邦域泛化
Federated Incremental Semantic Segmentation 联合增量语义分割
Federated Learning With Data-Agnostic Distribution Fusion 与数据无关的分布融合的联邦学习
FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation FedSeg：用于语义分割的类异构联邦学习
FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail Trajectory Prediction FEND：用于长尾轨迹预测的未来增强分布感知对比学习框架
Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation Few-Shot Class-Incremental Learning via Class-Aware 双边蒸馏
Few-Shot Geometry-Aware Keypoint Localization Few-Shot Geometry-Aware 关键点定位
Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment 具有视觉分布校准和跨模态分布对齐的小样本学习
Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization 具有信号表面协同正则化的少量非视线成像
Few-Shot Referring Relationships in Videos 视频中的少量参考关系
Few-Shot Semantic Image Synthesis With Class Affinity Transfer 具有类亲和力转移的小样本语义图像合成
FFCV: Accelerating Training by Removing Data Bottlenecks FFCV：通过消除数据瓶颈加速训练
FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures FFF：用于构建完整蛋白质结构的片段引导柔性拟合
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction FFHQ-UV：用于 3D 人脸重建的归一化面部 UV 纹理数据集
FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits FIANCEE：通过有条件的提前退出更快地推断对抗网络
Finding Geometric Models by Clustering in the Consensus Space 通过在共识空间中聚类寻找几何模型
Fine-Grained Audible Video Description 细粒度的音频视频描述
Fine-Grained Classification With Noisy Labels 带有噪声标签的细粒度分类
Fine-Grained Face Swapping via Regional GAN Inversion 通过区域 GAN 反转进行细粒度面部交换
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network 基于跨模态硬对齐网络的细粒度图文匹配
Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models Finetune Like You Pretrain：改进零镜头视觉模型的微调
Fine-Tuned CLIP Models Are Efficient Video Learners 微调的 CLIP 模型是高效的视频学习者
FitMe: Deep Photorealistic 3D Morphable Model Avatars FitMe：深度真实感 3D 可变形模型头像
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation 修复噪声：解开可控域翻译的源特征
FJMP: Factorized Joint Multi-Agent Motion Prediction Over Learned Directed Acyclic Interaction Graphs FJMP：基于学习的有向非循环交互图的分解联合多智能体运动预测
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction FLAG3D：具有语言指导的 3D 健身活动数据集
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer FlatFormer：高效点云转换器的扁平化窗口注意力
FLEX: Full-Body Grasping Without Full-Body Grasps FLEX：没有全身抓握的全身抓握
Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy Flexible-Cm GAN：实现放射治疗中的精确 3D 剂量预测
FlexiViT: One Model for All Patch Sizes FlexiViT：适用于所有补丁大小的一种模型
FlexNeRF: Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views FlexNeRF：从稀疏视图中移动人体的逼真自由视点渲染
Flow Supervision for Deformable NeRF 可变形 NeRF 的流动监督
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation FlowFormer++：用于预训练光流估计的掩蔽成本体积自动编码
FlowGrad: Controlling the Output of Generative ODEs With Gradients FlowGrad：使用梯度控制生成 ODE 的输出
Focus on Details: Online Multi-Object Tracking With Diverse Fine-Grained Representation 关注细节：具有多种细粒度表示的在线多目标跟踪
Focused and Collaborative Feedback Integration for Interactive Image Segmentation 用于交互式图像分割的集中协作反馈集成
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation 基础模型驱动语义分割的弱增量学习
Four-View Geometry With Unknown Radial Distortion 具有未知径向畸变的四视图几何
Frame Flexible Network 框架灵活网络
Frame Interpolation Transformer and Uncertainty Guidance 帧插值变压器和不确定性指导
Frame-Event Alignment and Fusion Network for High Frame Rate Tracking 用于高帧率跟踪的帧事件对齐和融合网络
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding FREDOM：语义场景理解的公平域适应方法
FreeNeRF: Improving Few-Shot Neural Rendering With Free Frequency Regularization FreeNeRF：通过自由频率正则化改进少样本神经渲染
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation FreeSeg：统一、通用和开放词汇的图像分割
Freestyle Layout-to-Image Synthesis 自由式布局到图像合成
Frequency-Modulated Point Cloud Rendering With Easy Editing 易于编辑的调频点云渲染
Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection Fresnel Microfacet BRDF：偏振辐射表面体反射的统一
From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models 从图像到文本提示：使用冻结的大型语言模型进行零样本视觉问答
From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm 从节点交互到跳交互：新的有效和可扩展的图学习范例
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning 令人沮丧的表示正则化可以促进深度强化学习
FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection FrustumFormer：用于多视图 3D 检测的自适应实例感知重采样
Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns 完整注释还是弱注释？预算受限注释活动的自适应策略
Fully Self-Supervised Depth Estimation From Defocus Clue 基于散焦线索的完全自监督深度估计
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning 通过强化学习将预训练语言模型与多模态提示融合
Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation 半监督语义分割的模糊积极学习
GaitGCI: Generative Counterfactual Intervention for Gait Recognition GaitGCI：步态识别的生成反事实干预
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second Galactic：以每秒 10 万步的速度扩展端到端强化学习以进行重排
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis GALIP：用于文本到图像合成的生成对抗性 CLIP
GamutMLP: A Lightweight MLP for Color Loss Recovery GamutMLP：用于颜色丢失恢复的轻量级 MLP
GANHead: Towards Generative Animatable Neural Head Avatars GANHead：走向生成动画神经头部头像
GANmouflage: 3D Object Nondetection With Texture Fields GANmouflage：具有纹理字段的 3D 对象非检测
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts GAPartNet：跨类别域——通过可泛化和可操作部分进行的可泛化对象感知和操作
GarmentTracking: Category-Level Garment Pose Tracking GarmentTracking：类别级服装姿势跟踪
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement 用于突发恢复和增强的门控多分辨率传输网络
Gated Stereo: Joint Depth Estimation From Gated and Wide-Baseline Active Stereo Cues 门控立体声：根据门控和宽基线主动立体声提示联合深度估计
Gaussian Label Distribution Learning for Spherical Image Object Detection 用于球面图像目标检测的高斯标签分布学习
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention Gazeformer：以目标为导向的人类注意力的可扩展、有效和快速预测
GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields GazeNeRF：具有神经辐射场的 3D 感知注视重定向
GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering GCFAgg：用于多视图聚类的全局和跨视图特征聚合
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds GD-MAE：用于激光雷达点云上 MAE 预训练的生成解码器
GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection GEN：突破基于 Softmax 的分布外检测的极限
GeneCIS: A Benchmark for General Conditional Image Similarity GeneCIS：一般条件图像相似性的基准
Generalist: Decoupling Natural and Robust Generalization 通才：解耦自然和稳健的泛化
Generalizable Implicit Neural Representations via Instance Pattern Composers 通过实例模式组合器的可泛化隐式神经表示
Generalizable Local Feature Pre-Training for Deformable Shape Analysis 可变形形状分析的通用局部特征预训练
Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation 泛化问题：通过参数混合进行损失最小值扁平化以实现高效的在线知识蒸馏
Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process 通过部分离散扩散过程的广义深度 3D 形状先验
Generalized Relation Modeling for Transformer Tracking 变压器跟踪的广义关系建模
Generalized UAV Object Detection via Frequency Domain Disentanglement 通过频域解耦的广义无人机目标检测
Generalizing Dataset Distillation via Deep Generative Prior 通过深度生成先验泛化数据集蒸馏
Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in Under-Display Camera 从非对齐数据生成对齐伪监督，用于屏下摄像头中的图像恢复
Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping 使用基于提示的特征映射为视频异常检测生成异常
Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection 生成具有增加的裁剪相关多样性的特征，用于小样本目标检测
Generating Holistic 3D Human Motion From Speech 从语音生成整体 3D 人体运动
Generating Human Motion From Textual Descriptions With Discrete Representations 从具有离散表示的文本描述生成人体运动
Generating Part-Aware Editable 3D Shapes Without 3D Supervision 在没有 3D 监督的情况下生成零件感知可编辑 3D 形状
Generative Bias for Robust Visual Question Answering 强大的视觉问答的生成偏差
Generative Diffusion Prior for Unified Image Restoration and Enhancement 用于统一图像恢复和增强的生成扩散先验
Generative Semantic Segmentation 生成语义分割
Generic-to-Specific Distillation of Masked Autoencoders 屏蔽自动编码器的通用到特定蒸馏
Genie: Show Me the Data for Quantization 精灵：显示量化数据
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction GeoLayoutLM：视觉信息提取的几何预训练
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training GeoMAE：自监督点云预训练的掩蔽几何目标预测
Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training 3D医学图像自监督预训练中的几何视觉相似性学习
Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation 几何和不确定性感知 3D 点云类-增量语义分割
GeoMVSNet: Learning Multi-View Stereo With Geometry Perception GeoMVSNet：通过几何感知学习多视图立体
GeoNet: Benchmarking Unsupervised Adaptation Across Geographies GeoNet：对跨地域的无监督适应进行基准测试
GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation GeoVLN：通过用于视觉和语言导航的槽注意学习几何增强视觉表示
GFIE: A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments GFIE：室内环境中从 2D 到 3D 的注视跟踪数据集和基线
GFPose: Learning 3D Human Pose Prior With Gradient Fields GFPose：使用梯度场学习 3D 人体姿势
GINA-3D: Learning To Generate Implicit Neural Assets in the Wild GINA-3D：学习在野外生成隐式神经资产
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods GIVL：通过预训练方法提高视觉语言模型的地理包容性
GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task GKEAL：用于 Few-Shot 类增量任务的高斯内核嵌入式分析学习
GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling GlassesGAN：使用合成外观发现和目标子空间建模的眼镜个性化
GLeaD: Improving GANs With a Generator-Leading Task GLeaD：通过生成器引导任务改进 GAN
GLIGEN: Open-Set Grounded Text-to-Image Generation GLIGEN：Open-Set Grounded Text-to-Image Generation
Global and Local Mixture Consistency Cumulative Learning for Long-Tailed Visual Recognitions 用于长尾视觉识别的全局和局部混合一致性累积学习
Global Vision Transformer Pruning With Hessian-Aware Saliency 具有 Hessian 感知显着性的 Global Vision Transformer 修剪
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation 基于视频的 3D 人体姿势和形状估计的全局到局部建模
Glocal Energy-Based Learning for Few-Shot Open-Set Recognition 用于小样本开放集识别的基于全局局部能量的学习
Gloss Attention for Gloss-Free Sign Language Translation 无光泽手语翻译的光泽注意
GM-NeRF: Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images GM-NeRF：从多视图图像中学习可泛化的基于模型的神经辐射场
G-MSM: Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors G-MSM：基于图的亲和先验的无监督多形状匹配
Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification 好就是坏：因果关系启发布料去偏用于换衣服的人重新识别
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning GP-VTON：通过协作本地流全局解析学习实现通用虚拟试穿
GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency GradICON：通过梯度逆一致性近似微分同胚
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization 梯度范数感知最小化寻求一阶平坦度并提高泛化能力
Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning 可解释贝叶斯深度学习的基于梯度的不确定性归因
GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting GradMA：一种基于梯度记忆的加速联邦学习，可减轻灾难性遗忘
Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learned Distance Functions Grad-PU：通过具有学习距离函数的梯度下降的任意尺度点云上采样
Graph Representation for Order-Aware Visual Transformation 订单感知视觉转换的图形表示
Graph Transformer GANs for Graph-Constrained House Generation 用于图形约束房屋生成的图形转换器 GAN
Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images 图形胶囊：从 2D 图像学习分层 3D 人脸表示
GraVoS: Voxel Selection for 3D Point-Cloud Detection GraVoS：用于 3D 点云检测的体素选择
GRES: Generalized Referring Expression Segmentation GRES：广义指称表达分割
Grid-Guided Neural Radiance Fields for Large Urban Scenes 用于大型城市场景的网格引导神经辐射场
Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space 图像分类器对文本概念空间的反事实解释
Ground-Truth Free Meta-Learning for Deep Compressive Sampling 用于深度压缩采样的 Ground-Truth 免费元学习
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds GrowSP：3D 点云的无监督语义分割
gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction gSDF：用于 3D 手部对象重建的几何驱动符号距离函数
Guided Depth Super-Resolution by Deep Anisotropic Diffusion 深度各向异性扩散的引导深度超分辨率
Guided Recommendation for Model Fine-Tuning 模型微调的指导性推荐
Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation 使用不确定性估计指导伪标签以实现无源无监督域适应
Gyeongsik Moon 庆植文
H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction H2ONet：用于实时 3D 手部网格重建的手部遮挡和方向感知网络
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning HAAV：用于图像字幕的增强视图的分层聚合
Habitat-Matterport 3D Semantics Dataset Habitat-Matterport 3D 语义数据集
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling HairStep：使用单视图 3D 头发建模的股线和深度图将合成转化为真实
HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions HaLP：基于骨架的自监督行为学习的幻觉潜在积极因素
Ham2Pose: Animating Sign Language Notation Into Pose Sequences Ham2Pose：将手语符号动画化为姿势序列
Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video 手部头像：自由姿势手部动画和单眼视频渲染
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands HandNeRF：可动画交互手的神经辐射场
HandsOff: Labeled Dataset Generation With No Additional Human Annotations HandsOff：没有额外人工注释的标记数据集生成
Handwritten Text Generation From Visual Archetypes 从视觉原型生成手写文本
Handy: Towards a High Fidelity 3D Hand Shape and Appearance Model Handy：迈向高保真 3D 手形和外观模型
Hard Patches Mining for Masked Image Modeling 用于蒙版图像建模的硬补丁挖掘
Hard Sample Matters a Lot in Zero-Shot Quantization 硬样本在零样本量化中很重要
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation 用于交互式手部姿势估计的和谐特征学习
Harmonious Teacher for Cross-Domain Object Detection 跨域目标检测和谐老师
HARP: Personalized Hand Reconstruction From a Monocular RGB Video HARP：根据单眼 RGB 视频重建个性化手部
HDR Imaging With Spatially Varying Signal-to-Noise Ratios 具有随空间变化的信噪比的 HDR 成像
Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation 用于网格分割的基于热扩散的多尺度和几何结构感知变换器
HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization HelixSurf：具有迭代交织正则化的室内场景稳健高效的神经隐式表面学习
Heterogeneous Continual Learning 异构持续学习
HexPlane: A Fast Representation for Dynamic Scenes HexPlane：动态场景的快速表示
HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation HGFormer：用于域广义语义分割的分层分组转换器
Hi4D: 4D Instance Segmentation of Close Human Interaction Hi4D：近距离人类交互的 4D 实例分割
Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision 隐藏的宝石：使用跨模态监督的 4D 雷达场景流学习
HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization HIER：通过分层正则化超越类标签的度量学习
Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding 使用没有运动编码的双层 CANF 的分层 B 帧视频编码
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation 用于小样本分割的分层密集相关蒸馏
Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy 分层判别学习改善生物医学显微镜的视觉表现
Hierarchical Fine-Grained Image Forgery Detection and Localization 分层细粒度图像伪造检测和定位
Hierarchical Neural Memory Network for Low Latency Event Processing 用于低延迟事件处理的分层神经记忆网络
Hierarchical Prompt Learning for Multi-Task Learning 多任务学习的分层提示学习
Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection 用于场景感知视频异常检测的分层语义对比
Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding 用于视频段落接地的分层语义对应网络
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection 用于 3D 半监督对象检测的分层监督和混洗数据增强
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition From Egocentric RGB Videos 用于从自我中心 RGB 视频进行 3D 手势估计和动作识别的分层时间变换器
Hierarchical Video-Moment Retrieval and Step-Captioning 分层视频时刻检索和步骤字幕
HierVL: Learning Hierarchical Video-Language Embeddings HierVL：学习分层视频语言嵌入
High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition 通过可扩展图频分解重建高保真 3D 手形
High-Fidelity 3D Face Generation From Natural Language Descriptions 根据自然语言描述生成高保真 3D 人脸
High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization 通过伪多视图优化实现高保真 3D GAN 反演
High-Fidelity 3D Human Digitization From Single 2K Resolution Images 从单个 2K 分辨率图像进行高保真 3D 人体数字化
High-Fidelity and Freely Controllable Talking Head Video Generation 高保真和可自由控制的会说话的头部视频生成
High-Fidelity Clothed Avatar Reconstruction From a Single Image 从单个图像重建高保真穿着衣服的头像
High-Fidelity Event-Radiance Recovery via Transient Event Frequency 通过瞬态事件频率实现高保真事件辐射恢复
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors 具有生成先验的单目视频的高保真面部化身重建
High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning 具有多模态情感空间学习的高保真广义情感谈话人脸生成
High-Fidelity Guided Image Synthesis With Latent Diffusion Models 具有潜在扩散模型的高保真引导图像合成
High-Frequency Stereo Matching Network 高频立体声匹配网络
Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering 用于不完全多视图聚类的基于高度置信度局部结构的共识图学习
High-Res Facial Appearance Capture From Polarized Smartphone Images 从偏光智能手机图像中捕获高分辨率面部外观
High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity 利用人脑活动的潜在扩散模型重建高分辨率图像
Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble Hi-LASSIE：从稀疏图像集合中发现高保真铰接形状和骨架
Hint-Aug: Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning 8 月提示：从 Foundation Vision Transformers 中汲取提示，以实现提升的少样本参数高效调优
Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning 具有异质图表示学习的组织病理学全幻灯片图像分析
HNeRV: A Hybrid Neural Representation for Videos HNeRV：视频的混合神经表示
HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models HOICLIP：使用视觉语言模型进行 HOI 检测的高效知识转移
HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images HOLODIFFUSION：使用 2D 图像训练 3D 扩散模型
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics HOOD：服装动力学广义建模的层次图
HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search HOTNAS：神经结构搜索的分层最优传输
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising HouseDiffusion：通过具有离散和连续去噪的扩散模型生成矢量平面图
How Can Objects Help Action Recognition? 对象如何帮助动作识别？
How to Backdoor Diffusion Models? 如何后门扩散模型？
How To Prevent the Continuous Damage of Noises To Model Training? 如何防止噪声对模型训练的持续伤害？
How To Prevent the Poor Performance Clients for Personalized Federated Learning? 如何防止性能不佳的客户端进行个性化联邦学习？
How You Feelin’? Learning Emotions and Mental States in Movie Scenes 你感觉如何？在电影场景中学习情绪和心理状态
HRDFuse: Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions HRDFuse：通过协作学习整体区域深度分布进行单眼 360 度深度估计
HS-Pose: Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation HS-Pose：用于类别级目标姿态估计的混合范围特征提取
Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning With Hyperspherical Embeddings Hubs 和 Hyperspheres：使用超球面嵌入减少 Hubness 并改进 Transductive Few-Shot Learning
Human Body Shape Completion With Implicit Shape and Flow Learning 通过隐式形状和流程学习完成人体形状
Human Guided Ground-Truth Generation for Realistic Image Super-Resolution 用于真实图像超分辨率的人类引导地面实况生成
Human Pose As Compositional Tokens 人体姿势作为组合标记
Human Pose Estimation in Extremely Low-Light Conditions 极低光照条件下的人体姿势估计
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes 人文艺术：一个多功能的以人为中心的数据集，连接自然和人工场景
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining HumanBench：通过投影仪辅助预训练实现以人为中心的一般感知
HumanGen: Generating Human Radiance Fields With Explicit Priors HumanGen：使用显式先验生成人类辐射场
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation HuManiFlow：用于人体姿态和形状分布估计的 SO(3) 流形上的祖先条件归一化流
Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection 人类作为灯泡：热反射的 3D 人体重建
Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation 狩猎稀疏性：用于半监督语义分割的密度引导对比学习
Hybrid Active Learning via Deep Clustering for Video Action Detection 通过深度聚类进行视频动作检测的混合主动学习
Hybrid Neural Rendering for Large-Scale Scenes With Motion Blur 具有运动模糊的大型场景的混合神经渲染
Hyperbolic Contrastive Learning for Visual Representations Beyond Objects 超越对象的视觉表示的双曲线对比学习
HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering HyperCUT：使用无监督排序的单个模糊图像的视频序列
HyperMatch: Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint HyperMatch：通过松弛对比约束的噪声容忍半监督学习
HyperReel: High-Fidelity 6-DoF Video With Ray-Conditioned Sampling HyperReel：具有光线条件采样的高保真 6 自由度视频
Hyperspherical Embedding for Point Cloud Completion 用于点云完成的超球面嵌入
HypLiLoc: Towards Effective LiDAR Pose Regression With Hyperbolic Fusion HypLiLoc：通过双曲线融合实现有效的 LiDAR 姿态回归
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification I2MVFormer：用于零样本图像分类的大型语言模型生成的多视图文档监督
I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs I2-SDF：通过神经 SDF 中的光线追踪进行内部室内场景重建和编辑
iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition iCLIP：桥接图像分类和对比语言图像预训练以进行视觉识别
Identity-Preserving Talking Face Generation With Landmark and Appearance Priors 具有地标和外观先验的身份保持说话人脸生成
IDGI: A Framework To Eliminate Explanation Noise From Integrated Gradients IDGI：消除集成梯度中的解释噪声的框架
iDisc: Internal Discretization for Monocular Depth Estimation iDisc：单眼深度估计的内部离散化
IFSeg: Image-Free Semantic Segmentation via Vision-Language Model IFSeg：通过视觉语言模型进行无图像语义分割
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes Im2Hands：学习交互双手形状的注意力内隐表示
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks 图像作为外语：BEiT 视觉和视觉语言任务的预训练
Image Cropping With Spatial-Aware Feature and Rank Consistency 具有空间感知特征和等级一致性的图像裁剪
Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding 基于元知识共嵌入的图像质量感知诊断
Image Super-Resolution Using T-Tetromino Pixels 使用 T-Tetromino 像素的图像超分辨率
ImageBind: One Embedding Space To Bind Them All ImageBind：一个嵌入空间来绑定它们
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting Imagen Editor 和 EditBench：推进和评估文本引导图像修复
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing ImageNet-E：通过属性编辑对神经网络鲁棒性进行基准测试
Images Speak in Images: A Generalist Painter for In-Context Visual Learning 图像用图像说话：上下文视觉学习的通才画家
Imagic: Text-Based Real Image Editing With Diffusion Models Imagic：使用扩散模型进行基于文本的真实图像编辑
Imitation Learning As State Matching via Differentiable Physics 模仿学习作为状态匹配通过微分物理
IMP: Iterative Matching and Pose Estimation With Adaptive Pooling IMP：使用自适应池的迭代匹配和姿态估计
Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View 使用与看不见的姿势和形状的一致性的隐式 3D 人体网格恢复
Implicit Diffusion Models for Continuous Super-Resolution 连续超分辨率的隐式扩散模型
Implicit Identity Driven Deepfake Face Swapping Detection 隐式身份驱动的 Deepfake 换脸检测
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization 隐性身份泄露：提高 Deepfake 检测泛化能力的绊脚石
Implicit Neural Head Synthesis via Controllable Local Deformation Fields 通过可控局部变形场的隐式神经头合成
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving 自动驾驶中用于感知和预测的隐式占用流场
Implicit Surface Contrastive Clustering for LiDAR Point Clouds LiDAR 点云的隐式表面对比聚类
Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates 使用多平面视差和非均匀坐标的立体视频的隐式观看时间插值
Improved Distribution Matching for Dataset Condensation 改进数据集压缩的分布匹配
Improved Test-Time Adaptation for Domain Generalization 改进了域泛化的测试时间适应
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles 通过知识图谜语提高视觉语言模型的常识
Improving Cross-Modal Retrieval With Set of Diverse Embeddings 使用一组不同的嵌入改进跨模态检索
Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues 通过视觉文本提示提高面部反照率估计的公平性
Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level 通过内层反向正则化改进元学习的泛化
Improving Generalization With Domain Convex Game 使用域凸博弈改进泛化
Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering 通过注意过滤改进点云分割的图形表示
Improving Image Recognition by Retrieving From Web-Scale Image-Text Data 通过从网络规模的图像文本数据中检索来改进图像识别
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization 通过直接 PAC-贝叶斯边界最小化改进鲁棒泛化
Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation 使用以类为中心的增强提高语义分割对运动模糊的鲁棒性
Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions 通过降低对补丁损坏的敏感性来提高视觉转换器的稳健性
Improving Selective Visual Question Answering by Learning From Your Peers 通过向同龄人学习改进选择性视觉问答
Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling 使用视觉对齐顺序坐标建模改进表结构识别
Improving the Transferability of Adversarial Samples by Path-Augmented Method 通过路径增强方法提高对抗样本的可迁移性
Improving Vision-and-Language Navigation by Generating Future-View Image Semantics 通过生成未来视图图像语义改进视觉和语言导航
Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations 通过鼓励一致的基于梯度的解释来改善视觉基础
Improving Visual Representation Learning Through Perceptual Understanding 通过感知理解改进视觉表征学习
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels 通过弥合伪标签中的训练-测试差距来改进弱监督时间动作定位
Improving Zero-Shot Generalization and Robustness of Multi-Modal Models 提高多模态模型的零样本泛化和鲁棒性
Incremental 3D Semantic Scene Graph Prediction From RGB Sequences 基于 RGB 序列的增量 3D 语义场景图预测
Incrementer: Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing on Old Class Incrementer：用于以旧类为中心的知识蒸馏的类增量语义分割转换器
Independent Component Alignment for Multi-Task Learning 多任务学习的独立组件对齐
Indescribable Multi-Modal Spatial Evaluator 难以描述的多模态空间评估器
Indiscernible Object Counting in Underwater Scenes 水下场景中不可辨别的物体计数
Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis 从物体形状中推断和利用部分来改进语义图像合成
Infinite Photorealistic Worlds Using Procedural Generation 使用程序生成的无限逼真世界
Ingredient-Oriented Multi-Degradation Learning for Image Restoration 用于图像恢复的面向成分的多退化学习
In-Hand 3D Object Scanning From an RGB Sequence 从 RGB 序列扫描手中的 3D 对象
Initialization Noise in Image Gradients and Saliency Maps 图像梯度和显着图中的初始化噪声
Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection 实例关系图引导的无源域自适应对象检测
Instance-Aware Domain Generalization for Face Anti-Spoofing 用于人脸反欺骗的实例感知域泛化
Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation 半监督语义分割的实例特定和模型自适应监督
Instant Domain Augmentation for LiDAR Semantic Segmentation LiDAR 语义分割的即时域增强
Instant Multi-View Head Capture Through Learnable Registration 通过可学习注册进行即时多视角头部捕捉
Instant Volumetric Head Avatars 即时立体头像
InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds InstantAvatar：在 60 秒内从单眼视频中学习头像
Instant-NVR: Instant Neural Volumetric Rendering for Human-Object Interactions From Monocular RGBD Stream Instant-NVR：基于单目 RGBD 流的人机交互即时神经体积渲染
InstMove: Instance Motion for Object-Centric Video Segmentation InstMove：用于以对象为中心的视频分割的实例运动
InstructPix2Pix: Learning To Follow Image Editing Instructions InstructPix2Pix：学习遵循图像编辑说明
Integral Neural Networks 积分神经网络
Integrally Pre-Trained Transformer Pyramid Networks 集成预训练变压器金字塔网络
Interactive and Explainable Region-Guided Radiology Report Generation 交互式和可解释的区域引导放射学报告生成
Interactive Cartoonization With Controllable Perceptual Factors 感知因素可控的交互式卡通化
Interactive Segmentation As Gaussion Process Classification 交互式分割作为 Gaussion 过程分类
Interactive Segmentation of Radiance Fields 辐射场的交互式分割
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions InternImage：探索具有可变形卷积的大规模视觉基础模型
Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images 全切片病理图像的介入包多实例学习
Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models 使用以对象为中心的预测模型发现内在物理概念
Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup 引入竞争以通过干净的特征混合提高目标对抗样本的可迁移性
Inverse Rendering of Translucent Objects Using Physical and Neural Renderers 使用物理和神经渲染器对半透明物体进行逆向渲染
Inversion-Based Style Transfer With Diffusion Models 使用扩散模型进行基于反转的风格迁移
Invertible Neural Skinning 可逆神经蒙皮
Inverting the Imaging Process by Learning an Implicit Camera Model 通过学习隐式相机模型来反转成像过程
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction IPCC-TP：利用增量皮尔逊相关系数进行联合多智能体轨迹预测
iQuery: Instruments As Queries for Audio-Visual Sound Separation iQuery：作为视听声音分离查询的工具
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding BERT是盲人吗？探索视觉和语言预训练对视觉语言理解的影响
ISBNet: A 3D Point Cloud Instance Segmentation Network With Instance-Aware Sampling and Box-Aware Dynamic Convolution ISBNet：具有实例感知采样和框感知动态卷积的 3D 点云实例分割网络
IS-GGT: Iterative Scene Graph Generation With Generative Transformers IS-GGT：使用生成变换器生成迭代场景图
Iterative Geometry Encoding Volume for Stereo Matching 用于立体匹配的迭代几何编码体积
Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections 灌木横截面显微图像中树木年轮实例分割的迭代下一个边界检测
Iterative Proposal Refinement for Weakly-Supervised Video Grounding 弱监督视频接地的迭代提议细化
Iterative Vision-and-Language Navigation 迭代视觉和语言导航
IterativePFN: True Iterative Point Cloud Filtering IterativePFN：真正的迭代点云过滤
itKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection itKD：用于 3D 对象检测的基于交换传输的知识蒸馏
JacobiNeRF: NeRF Shaping With Mutual Information Gradients JacobiNeRF：具有互信息梯度的 NeRF 整形
JAWS: Just a Wild Shot for Cinematic Transfer in Neural Radiance Fields JAWS：只是神经辐射场中电影传输的疯狂镜头
Jedi: Entropy-Based Localization and Removal of Adversarial Patches Jedi：基于熵的定位和对抗补丁的移除
Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction 用于高效卷帘快门校正的联合外观和运动学习
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset 联合 HDR 降噪和融合：真实世界移动 HDR 图像数据集
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers 联合令牌修剪和挤压以实现更积极的视觉转换器压缩
Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time 未知曝光时间下的联合视频多帧插值和去模糊
Joint Visual Grounding and Tracking With Natural Language Specification 具有自然语言规范的联合视觉接地和跟踪
JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking JRDB-Pose：用于多人姿态估计和跟踪的大规模数据集
K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring K3DN：双像素散焦去模糊的视差感知内核估计
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation KD-DLGAN：通过知识蒸馏生成数据有限的图像
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation KERM：视觉和语言导航的知识增强推理
Kernel Aware Resampler 内核感知重采样器
KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation KiUT：用于放射学报告生成的知识注入 U 型变压器
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation 无旋转标注学习旋转检测的知识组合
Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions 通过对齐局部预测分布进行 6D 姿态估计的知识蒸馏
Label Information Bottleneck for Label Enhancement 标签增强的标签信息瓶颈
Label-Free Liver Tumor Segmentation 无标记肝肿瘤分割
LANA: A Language-Capable Navigator for Instruction Following and Generation LANA：一种用于指令跟随和生成的具有语言能力的导航器
Language Adaptive Weight Generation for Multi-Task Visual Grounding 用于多任务视觉接地的语言自适应权重生成
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification 瓶中语言：语言模型引导可解释图像分类的概念瓶颈
Language-Guided Audio-Visual Source Separation via Trimodal Consistency 通过三模态一致性进行语言引导的视听源分离
Language-Guided Music Recommendation for Video via Prompt Analogies 通过提示类比为视频推荐语言指导的音乐
LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data LANIT：未标记数据的语言驱动图像到图像翻译
Large-Capacity and Flexible Video Steganography via Invertible Neural Network 基于可逆神经网络的大容量和灵活的视频隐写术
LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs LargeKernel3D：在 3D 稀疏 CNN 中放大内核
Large-Scale Training Data Search for Object Re-Identification 用于对象重新识别的大规模训练数据搜索
LaserMix for Semi-Supervised LiDAR Semantic Segmentation 用于半监督 LiDAR 语义分割的 LaserMix
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models LASP：视觉和语言模型的语言感知软提示的文本到文本优化
Latency Matters: Real-Time Action Forecasting Transformer 延迟很重要：实时行动预测转换器
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures 用于形状引导生成 3D 形状和纹理的 Latent-NeRF
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling LAVENDER：统一视频语言理解作为掩码语言建模
Layout-Based Causal Inference for Object Navigation 用于对象导航的基于布局的因果推理
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation LayoutDiffusion：用于布局到图像生成的可控扩散模型
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation LayoutDM：用于可控布局生成的离散扩散模型
LayoutDM: Transformer-Based Diffusion Model for Layout Generation LayoutDM：用于布局生成的基于变压器的扩散模型
LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction LayoutFormer++：通过约束序列化和解码空间限制生成条件图形布局
L-CoIns: Language-Based Colorization With Instance Awareness L-CoIns：具有实例意识的基于语言的着色
Leapfrog Diffusion Model for Stochastic Trajectory Prediction 用于随机轨迹预测的 Leapfrog 扩散模型
Learnable Skeleton-Aware 3D Point Cloud Sampling 可学习的骨架感知 3D 点云采样
Learned Image Compression With Mixed Transformer-CNN Architectures 使用混合 Transformer-CNN 架构学习图像压缩
Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object Detection 学习基于双平面透视先验的图像重采样以实现高效目标检测
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders 通过 Image-to-Point Masked Autoencoders 从 2D 预训练模型学习 3D 表示
Learning 3D Scene Priors With 2D Supervision 通过 2D 监督学习 3D 场景先验
Learning 3D-Aware Image Synthesis With Unknown Pose Distribution 学习具有未知姿态分布的 3D 感知图像合成
Learning a 3D Morphable Face Reflectance Model From Low-Cost Data 从低成本数据中学习 3D 可变形面部反射率模型
Learning a Deep Color Difference Metric for Photographic Images 学习摄影图像的深色差度量
Learning a Depth Covariance Function 学习深度协方差函数
Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models 使用新数据集和退化模型学习实用的 SDR 到 HDRTV 上转换
Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances 从配对的低光实例中学习简单的低光图像增强器
Learning a Sparse Transformer Network for Effective Image Deraining 学习用于有效图像去雨的稀疏变换器网络
Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging 基于立体偏振成像学习准确的 3D 形状
Learning Action Changes by Measuring Verb-Adverb Textual Relationships 通过测量动副词文本关系来学习动作变化
Learning Adaptive Dense Event Stereo From the Image Domain 从图像域学习自适应密集事件立体
Learning Analytical Posterior Probability for Human Mesh Recovery 学习人类网格恢复的分析后验概率
Learning Anchor Transformations for 3D Garment Animation 学习 3D 服装动画的锚转换
Learning and Aggregating Lane Graphs for Urban Automated Driving 学习和聚合城市自动驾驶的车道图
Learning Articulated Shape With Keypoint Pseudo-Labels From Web Images 从网络图像中使用关键点伪标签学习铰接形状
Learning Attention As Disentangler for Compositional Zero-Shot Learning 学习注意力作为组合零样本学习的解耦器
Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis 用于细粒度时尚分析的学习属性和类特定表示二重奏
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning 通过假阴性意识对比学习学习视听源定位
Learning Bottleneck Concepts in Image Classification 学习图像分类中的瓶颈概念
Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems 学习共同的基本原理以改进细粒度视觉识别问题的自监督表示
Learning Compact Representations for LiDAR Completion and Generation 学习 LiDAR 完成和生成的紧凑表示
Learning Conditional Attributes for Compositional Zero-Shot Learning 组合零样本学习的学习条件属性
Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares 通过可微非线性最小二乘法学习对应不确定性
Learning Customized Visual Models With Retrieval-Augmented Knowledge 使用检索增强知识学习定制的视觉模型
Learning Debiased Representations via Conditional Attribute Interpolation 通过条件属性插值学习去偏表示
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform 使用快速傅里叶变换有效地学习去相关表示
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image 从单目图像中学习详细的辐射流形以实现高保真和 3D 一致的人像合成
Learning Discriminative Representations for Skeleton Based Action Recognition 学习基于骨架的动作识别的判别表示
Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective 从因果关系的角度学习图像恢复的畸变不变表示
Learning Dynamic Style Kernels for Artistic Style Transfer 学习艺术风格迁移的动态风格内核
Learning Emotion Representations From Verbal and Nonverbal Communication 从语言和非语言交流中学习情绪表征
Learning Event Guided High Dynamic Range Video Reconstruction 学习事件引导的高动态范围视频重建
Learning Expressive Prompting With Residuals for Vision Transformers 学习视觉转换器的残差表达提示
Learning Federated Visual Prompt in Null Space for MRI Reconstruction 在 Null 空间中学习联邦视觉提示以进行 MRI 重建
Learning From Noisy Labels With Decoupled Meta Label Purifier 使用解耦元标签净化器从嘈杂标签中学习
Learning From Unique Perspectives: User-Aware Saliency Modeling 从独特的视角学习：用户感知显着性建模
Learning Generative Structure Prior for Blind Text Image Super-Resolution 学习盲文本图像超分辨率的生成结构先验
Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs 使用轻量级 CAD 模型或零真实 3D 对学习 2D 表示中的几何感知属性
Learning Geometry-Aware Representations by Sketching 通过素描学习几何感知表示
Learning Human Mesh Recovery in 3D Scenes 学习 3D 场景中的人体网格恢复
Learning Human-to-Robot Handovers From Point Clouds 从点云中学习人机切换
Learning Imbalanced Data With Vision Transformers 使用 Vision Transformers 学习不平衡数据
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce 电子商务中大规模多模态预训练的学习实例级表示
Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator 学习多层生成器的联合潜在空间 EBM 先验模型
Learning Locally Editable Virtual Humans 学习本地可编辑虚拟人
Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization 学习用于弱监督密集对象定位的多模式类特定标记
Learning Neural Duplex Radiance Fields for Real-Time View Synthesis 学习用于实时视图合成的神经双工辐射场
Learning Neural Parametric Head Models 学习神经参数头部模型
Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild 在野外学习神经原人脸场以进行解缠结的 3D 人脸建模
Learning Neural Volumetric Representations of Dynamic Humans in Minutes 在几分钟内学习动态人体的神经体积表示
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection 梯度学习：GAN 生成图像检测的广义伪像表示
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision 从自然语言监督中学习开放词汇语义分割模型
Learning Optical Expansion From Scale Matching 从比例匹配中学习光学扩展
Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation 学习用于广义小样本语义分割的正交原型
Learning Partial Correlation Based Deep Visual Representation for Image Classification 学习基于偏相关的图像分类深度视觉表示
Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos 从单目 RGB 视频中学习个性化的高质量体积头像
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations 从教学视频及其旁白学习过程感知视频表示
Learning Rotation-Equivariant Features for Visual Correspondence 学习视觉对应的旋转等变特征
Learning Sample Relationship for Exposure Correction 曝光校正的学习样本关系
Learning Semantic Relationship Among Instances for Image-Text Matching 学习图像文本匹配实例间的语义关系
Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing 为灵活的 3D 人体编辑学习语义感知分离表示
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement 学习低光图像增强的语义感知知识指导
Learning Situation Hyper-Graphs for Video Question Answering 用于视频问答的学习情境超图
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution 学习事件引导视频超分辨率的时空隐式神经表征
Learning Steerable Function for Efficient Image Resampling 学习可控函数以实现高效图像重采样
Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation 学习联合视差和不确定性估计的立体匹配误差分布
Learning To Detect and Segment for Open Vocabulary Object Detection Learning To Detect and Segment for Open Vocabulary 对象检测
Learning To Detect Mirrors From Videos via Dual Correspondences 通过双重对应学习从视频中检测镜像
Learning To Dub Movies via Hierarchical Prosody Models 通过分层韵律模型学习配音电影
Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing 学习利用时间结构进行生物医学视觉语言处理
Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization 学习利用序列特定的先验知识进行图像处理管道优化
Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth Estimation in Dynamic Scenes 学习融合单眼和多视图线索以在动态场景中进行多帧深度估计
Learning To Generate Image Embeddings With User-Level Differential Privacy 学习生成具有用户级差分隐私的图像嵌入
Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space 学习使用预训练的视觉语义空间生成语言监督和开放词汇场景图
Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs 学习仅从图像-文本对生成用于开放世界语义分割的基于文本的掩码
Learning To Measure the Point Cloud Reconstruction Loss in a Representation Space 学习测量表示空间中的点云重建损失
Learning To Name Classes for Vision and Language Models 学习为视觉和语言模型命名类
Learning To Predict Scene-Level Implicit 3D From Posed RGBD Data 学习根据 RGBD 数据预测场景级隐式 3D
Learning To Render Novel Views From Wide-Baseline Stereo Pairs 学习从宽基线立体声对中呈现新颖的视图
Learning To Retain While Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation 在获取的同时学习保留：对抗无数据知识蒸馏中的分布转移
Learning To Segment Every Referring Object Point by Point 学习逐点分割每个参考对象
Learning To Zoom and Unzoom 学习缩放和取消缩放
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge 从自然文字知识中学习可迁移时空表征
Learning Transformation-Predictive Representations for Detection and Description of Local Features 用于检测和描述局部特征的学习转换预测表示
Learning Transformations To Reduce the Geometric Shift in Object Detection 学习转换以减少对象检测中的几何偏移
Learning Video Representations From Large Language Models 从大型语言模型中学习视频表示
Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting 学习可见性场以进行详细的 3D 人体重建和重新照明
Learning Visual Representations via Language-Guided Sampling 通过语言引导采样学习视觉表示
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions 学习在多种恶劣天气条件下进行图像恢复的天气一般特征和天气特定特征
Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning Learning With Fantasy：用于小样本类增量学习的语义感知虚拟对比约束
Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking 通过自监督对抗噪声掩蔽学习噪声标签
LEGO-Net: Learning Regular Rearrangements of Objects in Rooms LEGO-Net：学习房间中物体的定期重新排列
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization LEMaRT：用于图像协调的标签高效掩蔽区域变换
Less Is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation 少即是多：降低 3D 点云语义分割的任务和模型复杂性
Level-S $^2$ fM: Structure From Motion on Neural Level Set of Implicit Surfaces Level-S $^2$ fM：隐式曲面的神经水平集上的运动结构
Leverage Interactive Affinity for Affordance Learning 利用交互式亲和力进行可供学习
Leveraging Hidden Positives for Unsupervised Semantic Segmentation 利用隐藏的积极因素进行无监督语义分割
Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels 在存在噪声标签的情况下利用评估者间协议进行分类
Leveraging per Image-Token Consistency for Vision-Language Pre-Training 利用每个图像标记的一致性进行视觉语言预训练
Leveraging Temporal Context in Low Representational Power Regimes 在低代表性权力制度中利用时间背景
LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising LG-BPN：用于自监督真实世界去噪的本地和全局盲补丁网络
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation LiDAR2Map：使用在线相机蒸馏防御基于 LiDAR 的语义地图构建
LidarGait: Benchmarking 3D Gait Recognition With Point Clouds LidarGait：使用点云对 3D 步态识别进行基准测试
LiDAR-in-the-Loop Hyperparameter Optimization 激光雷达在环超参数优化
Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field Lift3D：通过将 2D GAN 提升到 3D 生成辐射场来合成 3D 训练数据
Light Source Separation and Intrinsic Image Decomposition Under AC Illumination 交流照明下的光源分离和本征图像分解
LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles LightedDepth：基于有限推理视角的视频深度估计
LightPainter: Interactive Portrait Relighting With Freehand Scribble LightPainter：使用徒手涂鸦进行交互式人像重新打光
LINe: Out-of-Distribution Detection by Leveraging Important Neurons LINe：利用重要神经元进行分布外检测
LinK: Linear Kernel for LiDAR-Based 3D Perception LinK：用于基于 LiDAR 的 3D 感知的线性内核
Linking Garment With Person via Semantically Associated Landmarks for Virtual Try-On 通过语义关联地标将服装与人联系起来进行虚拟试穿
LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook LipFormer：使用预先学习的面部密码本生成高保真和通用的会说话的面部
Lipu Zhou 周立普
Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals 聆听人类行为：利用声学信号估计 3D 人体姿势
Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR Lite DETR：用于高效 DETR 的交错式多尺度编码器
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation Lite-Mono：用于自监督单目深度估计的轻量级 CNN 和 Transformer 架构
Local 3D Editing via 3D Distillation of CLIP Knowledge 通过 CLIP 知识的 3D 蒸馏进行本地 3D 编辑
Local Connectivity-Based Density Estimation for Face Clustering 基于局部连通性的人脸聚类密度估计
Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution 任意尺度图像超分辨率的局部隐式归一化流程
Local Implicit Ray Function for Generalizable Radiance Field Representation 可推广辐射场表示的局部隐式射线函数
Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning Local-Guided Global：视觉强化学习的配对相似性表示
Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving 用于自动驾驶中高效行人检测的局部语义特征混合器
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields 束调整神经辐射场的局部到全局配准
LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding LOCATE：为弱监督的可供性接地定位和转移对象部分
Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning 面部毛发属性学习的逻辑一致性和更强的描述力
Logical Implications for Visual Question Answering Consistency 视觉问答一致性的逻辑含义
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment LOGO：用于群体行动质量评估的长格式视频数据集
LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion LoGoNet：通过局部到全局的交叉模态融合实现准确的 3D 对象检测
Long Range Pooling for 3D Large-Scale Scene Understanding 用于 3D 大规模场景理解的远程池化
Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation 通过自异构集成与知识挖掘的长尾视觉识别
Long-Term Visual Localization With Mobile Sensors 使用移动传感器进行长期视觉定位
Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning 四处寻找异常：通过上下文运动关系学习进行弱监督异常检测
Look Before You Match: Instance Understanding Matters in Video Object Segmentation 匹配前先看：实例理解在视频对象分割中很重要
Lookahead Diffusion Probabilistic Models for Refining Mean Estimation 用于细化均值估计的前瞻扩散概率模型
Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections 透过玻璃看：针对高镜面反射的神经表面重建
Low-Light Image Enhancement via Structure Modeling and Guidance 通过结构建模和指导增强低光图像
LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes LP-DIF：学习 3D 对象和场景的局部模式特定深度隐式函数
LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection LSTFE-Net：用于视频小目标检测的长短期特征增强网络
LVQAC: Lattice Vector Quantization Coupled With Spatially Adaptive Companding for Efficient Learned Image Compression LVQAC：格子矢量量化与空间自适应压扩相结合，实现高效的学习图像压缩
MACARONS: Mapping and Coverage Anticipation With RGB Online Self-Supervision MACARONS：使用 RGB 在线自监督进行映射和覆盖预测
MAESTER: Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition MAESTER：Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, 自监督亚细胞结构识别
MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis MAGE：MAsked 生成编码器统一表示学习和图像合成
Magic3D: High-Resolution Text-to-3D Content Creation Magic3D：高分辨率文本到 3D 内容创建
MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery MagicNet：通过 Magic-Cube 分区和恢复进行半监督多器官分割
MagicPony: Learning Articulated 3D Animals in the Wild MagicPony：在野外学习铰接式 3D 动物
MAGVIT: Masked Generative Video Transformer MAGVIT：蒙面生成视频转换器
MAGVLT: Masked Generative Vision-and-Language Transformer MAGVLT：蒙面生成视觉和语言转换器
MAIR: Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation MAIR：具有 3D 空间变化照明估计的多视图注意逆渲染
Make Landscape Flatter in Differentially Private Federated Learning 在差分私有联邦学习中使景观更平坦
Make-a-Story: Visual Memory Conditioned Consistent Story Generation 编故事：视觉记忆条件一致的故事生成
Making Vision Transformers Efficient From a Token Sparsification View 从令牌稀疏化的角度使视觉转换器高效
MaLP: Manipulation Localization Using a Proactive Scheme MaLP：使用主动方案进行操作定位
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding MammalNet：哺乳动物识别和行为理解的大规模视频基准
Manipulating Transfer Learning for Property Inference 为属性推理操纵迁移学习
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model MAP：多模态不确定性感知视觉语言预训练模型
MaPLe: Multi-Modal Prompt Learning MaPLe：多模态即时学习
Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision 映射退化遇到标签进化：通过单点监督学习红外小目标检测
Marching-Primitives: Shape Abstraction From Signed Distance Function Marching-Primitives：有符号距离函数的形状抽象
MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins MarginMatch：使用伪边距改进半监督学习
Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Transfer 通过自监督模拟到真实传输的无标记相机到机器人姿态估计
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg MARLIN：用于面部视频表示学习的蒙面自动编码器
MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds MarS3D：用于多扫描 3D 点云语义分割的即插即用运动感知模型
Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation Mask DINO：面向对象检测和分割的基于 Transformer 的统一框架
Mask3D: Pre-Training 2D Vision Transformers by Learning Masked 3D Priors Mask3D：通过学习 Masked 3D Priors 预训练 2D Vision Transformers
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining MaskCLIP：Masked 自蒸馏推进对比语言图像预训练
MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset MaskCon：粗标记数据集的蒙面对比学习
Masked and Adaptive Transformer for Exemplar Based Image Translation 用于基于样本的图像翻译的掩蔽和自适应变换器
Masked Autoencoders Enable Efficient Knowledge Distillers Masked Autoencoders 启用高效的知识蒸馏器
Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond 蒙面自动编码器遇到生成对抗网络及其他
Masked Autoencoding Does Not Help Natural Language Supervision at Scale 掩码自动编码无助于大规模自然语言监督
Masked Image Modeling With Local Multi-Scale Reconstruction 具有局部多尺度重建的蒙版图像建模
Masked Image Training for Generalizable Deep Image Denoising 用于通用深度图像去噪的蒙版图像训练
Masked Images Are Counterfactual Samples for Robust Fine-Tuning 蒙版图像是用于稳健微调的反事实样本
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers Masked Jigsaw Puzzle：视觉转换器的多功能位置嵌入
Masked Motion Encoding for Self-Supervised Video Representation Learning 用于自监督视频表示学习的掩蔽运动编码
Masked Representation Learning for Domain Generalized Stereo Matching 域广义立体匹配的蒙面表示学习
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning Masked Scene Contrast：无监督 3D 表示学习的可扩展框架
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning 蒙面视频蒸馏：重新思考自监督视频表示学习的蒙面特征建模
Masked Wavelet Representation for Compact Neural Radiance Fields 紧凑型神经辐射场的掩蔽小波表示
Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations Mask-Free OVIS：没有手动掩码注释的开放式词汇实例分割
Mask-Free Video Instance Segmentation 无掩码视频实例分割
Mask-Guided Matting in the Wild 野外面具引导消光
MaskSketch: Unpaired Structure-Guided Masked Image Generation MaskSketch：未配对的结构引导蒙版图像生成
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer Master：Meta Style Transformer，用于可控的零镜头和少镜头艺术风格转换
Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation 匹配是不够的：类别不可知姿态估计的两阶段框架
MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation MCF：半监督医学图像分割的互校正框架
MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer MDL-NAS：Vision Transformer 的联合多领域学习框架
MDQE: Mining Discriminative Query Embeddings To Segment Occluded Instances on Challenging Videos MDQE：挖掘判别查询嵌入以在具有挑战性的视频上分割被遮挡的实例
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos MD-VQA：UGC 直播视频的多维质量评估
MEDIC: Remove Model Backdoors via Importance Driven Cloning MEDIC：通过重要性驱动克隆移除模型后门
MED-VT: Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation MED-VT：应用于对象分割的多尺度编码器-解码器视频转换器
Megahertz Light Steering Without Moving Parts 没有移动部件的兆赫光转向
MEGANE: Morphable Eyeglass and Avatar Network MEGANE：可变形眼镜和头像网络
MELTR: Meta Loss Transformer for Learning To Fine-Tune Video Foundation Models MELTR：用于学习微调视频基础模型的元损失转换器
MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction MeMaHand：利用 Mesh-Mano 交互进行单幅图像双手重建
Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis 通过倒带彩票假设实现内存友好的可扩展超分辨率
Meta Architecture for Point Cloud Analysis 点云分析的元架构
Meta Compositional Referring Expression Segmentation 元成分指称表达分割
Meta Omnium: A Benchmark for General-Purpose Learning-To-Learn Meta Omnium：通用学习学习的基准
Meta-Causal Learning for Single Domain Generalization 单域泛化的元因果学习
MetaCLUE: Towards Comprehensive Visual Metaphors Research MetaCLUE：迈向全面的视觉隐喻研究
Metadata-Based RAW Reconstruction via Implicit Neural Functions 通过隐式神经功能进行基于元数据的 RAW 重建
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding 元探索：使用场景对象频谱接地的探索性分层视觉和语言导航
MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection MetaFusion：通过对象检测的元特征嵌入实现红外和可见光图像融合
Meta-Learning With a Geometry-Adaptive Preconditioner 使用几何自适应预处理器进行元学习
MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation MetaMix：通过时间自适应数据转换实现腐败鲁棒的持续学习
Meta-Personalizing Vision-Language Models To Find Named Instances in Video 用于在视频中查找命名实例的元个性化视觉语言模型
MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation MetaPortrait：具有快速个性化适应的身份保持谈话头像生成
Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection 用于小样本目标检测的元调整损失函数和数据增强
MetaViewer: Towards a Unified Multi-View Representation MetaViewer：走向统一的多视图表示
MethaneMapper: Spectral Absorption Aware Hyperspectral Transformer for Methane Detection MethaneMapper：用于甲烷检测的光谱吸收感知高光谱变换器
METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens METransformer：使用多个可学习专家令牌的 Transformer 生成放射学报告
MHPL: Minimum Happy Points Learning for Active Source Free Domain Adaptation MHPL：Active Source Free Domain Adaptation 的 Minimum Happy Points Learning
MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation MIANet：为小样本语义分割聚合无偏实例和一般信息
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation MIC：上下文增强域适应的蒙版图像一致性
Micron-BERT: BERT-Based Facial Micro-Expression Recognition Micron-BERT：基于BERT的面部微表情识别
MIME: Human-Aware 3D Scene Generation MIME：人类感知 3D 场景生成
Mind the Label Shift of Augmentation-Based Graph OOD Generalization 注意基于增强的图 OOD 泛化的标签移位
Minimizing Maximum Model Discrepancy for Transferable Black-Box Targeted Attacks 最小化可转移黑盒目标攻击的最大模型差异
Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation 最小化累积轨迹误差以改进数据集蒸馏
MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence MISC210K：用于多实例语义对应的大规模数据集
MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering MIST：用于长格式视频问答的多模式迭代时空转换器
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives 通过具有不可学习原语的显式任务路由减轻多任务学习中的任务干扰
Mixed Autoencoder for Self-Supervised Visual Representation Learning 用于自监督视觉表示学习的混合自动编码器
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers MixMAE：用于高效预训练分层视觉转换器的混合和掩码自动编码器
MixNeRF: Modeling a Ray With Mixture Density for Novel View Synthesis From Sparse Inputs MixNeRF：使用混合密度对光线进行建模，以从稀疏输入中合成新的视图
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering MixPHM：低资源视觉问答的冗余感知参数高效调优
MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation MixSim：混合现实交通模拟的分层框架
MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection MixTeacher：使用混合尺度教师挖掘有前途的标签以进行半监督对象检测
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency MM-3DScene：通过使用信息保存重建和自蒸馏一致性定制蒙版建模来理解 3D 场景
MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning MMANet：不完全多模态学习的边距感知蒸馏和模态感知正则化
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation MM-Diffusion：学习用于联合音频和视频生成的多模态扩散模型
MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition MMG-Ego4D：自我中心动作识别中的多模态泛化
MMVC: Learned Multi-Mode Video Compression With Block-Based Prediction Mode Selection and Density-Adaptive Entropy Coding MMVC：通过基于块的预测模式选择和密度自适应熵编码学习多模式视频压缩
Mobile User Interface Element Detection via Adaptively Prompt Tuning 通过自适应提示调整进行移动用户界面元素检测
MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices MobileBrick：为移动设备上的 3D 重建搭建乐高积木
MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures MobileNeRF：利用多边形光栅化管道在移动架构上进行高效的神经场渲染
MobileOne: An Improved One Millisecond Mobile Backbone MobileOne：改进的一毫秒移动骨干网
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning Meets Knowledge Distillation MobileVOS：实时视频对象分割对比学习遇见知识蒸馏
Modality-Agnostic Debiasing for Single Domain Generalization 单域泛化的模态不可知去偏
Modality-Invariant Visual Odometry for Embodied Vision 具身视觉的模态不变视觉里程计
MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences MoDAR：使用运动预测进行点云序列中的 3D 对象检测
Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection 模型屏障：用于模型知识产权保护的紧凑型不可转移隔离域
Model-Agnostic Gender Debiased Image Captioning 与模型无关的性别去偏见图像说明
Modeling Entities As Semantic Points for Visual Information Extraction in the Wild 将实体建模为野外视觉信息提取的语义点
Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery 在新类发现中对类间和类内约束进行建模
Modeling the Distributional Uncertainty for Salient Object Detection Models 为显着目标检测模型的分布不确定性建模
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning 将视频建模为细粒度视频表示学习的随机过程
Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer 通过真实感风格转换使用多个参考对旧照片进行现代化改造
MoDi: Unconditional Motion Synthesis From Diverse Data MoDi：来自不同数据的无条件运动合成
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners Mod-Squad：将专家组合设计为模块化多任务学习者
Modular Memorability: Tiered Representations for Video Memorability Prediction 模块化记忆性：视频记忆性预测的分层表示
Mofusion: A Framework for Denoising-Diffusion-Based Motion Synthesis Mofusion：基于去噪扩散的运动合成框架
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition MoLo：用于小动作识别的运动增强长短对比学习
MonoATT: Online Monocular 3D Object Detection With Adaptive Token Transformer MonoATT：使用自适应令牌转换器的在线单目 3D 对象检测
MonoHuman: Animatable Human Neural Field From Monocular Video MonoHuman：来自单眼视频的可动画人类神经场
MOSO: Decomposing MOtion, Scene and Object for Video Prediction MOSO：为视频预测分解运动、场景和对象
MoStGAN-V: Video Generation With Temporal Motion Styles MoStGAN-V：具有时间运动风格的视频生成
MOT: Masked Optimal Transport for Partial Domain Adaptation MOT：部分域适应的掩蔽最优传输
Motion Information Propagation for Neural Video Compression 神经视频压缩的运动信息传播
MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion MotionDiffuser：使用扩散的可控多代理运动预测
MotionTrack: Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking MotionTrack：为多目标跟踪学习稳健的短期和长期运动
MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors MOTRv2：通过预训练目标检测器引导端到端多目标跟踪
MOVES: Manipulated Objects in Video Enable Segmentation MOVES：视频中的操纵对象启用分割
Movies2Scenes: Using Movie Metadata To Learn Scene Representation Movies2Scenes：使用电影元数据学习场景表示
MP-Former: Mask-Piloted Transformer for Image Segmentation MP-Former：用于图像分割的 Mask-Piloted Transformer
MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving MSeg3D：用于自动驾驶的多模式 3D 语义分割
MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences MSF：用于从点云序列进行高效 3D 对象检测的运动引导顺序融合
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID MSINet：对象重识别的多尺度交互的孪生对比搜索
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection MSMDFusion：将多尺度激光雷达和相机与多深度种子融合以进行 3D 对象检测
Multi Domain Learning for Motion Magnification 运动放大的多域学习
Multi-Agent Automated Machine Learning 多代理自动机器学习
Multi-Centroid Task Descriptor for Dynamic Class Incremental Inference 用于动态类增量推理的多质心任务描述符
Multiclass Confidence and Localization Calibration for Object Detection 目标检测的多类置信度和定位校准
Multi-Concept Customization of Text-to-Image Diffusion 文本到图像扩散的多概念定制
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph 基于知识引导关系图的中国青铜鼎多粒度考古测年
Multi-Label Compound Expression Recognition: C-EXPR Database & Network 多标签复合表达式识别：C-EXPR 数据库和网络
Multilateral Semantic Relations Modeling for Image Text Retrieval 图像文本检索的多边语义关系建模
Multi-Level Logit Distillation 多级逻辑蒸馏
Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion 通过有效的时空特征融合进行多模态步态识别
Multimodal Industrial Anomaly Detection via Hybrid Fusion 通过混合融合进行多模式工业异常检测
Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling 通过共享特定特征建模的缺失模态多模态学习
Multimodal Prompting With Missing Modalities for Visual Recognition 视觉识别缺失模态的多模态提示
Multi-Modal Representation Learning With Text-Driven Soft Masks 使用文本驱动的软掩码进行多模态表示学习
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models 多模态有助于单模态：使用多模态模型进行跨模态小样本学习
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning 用于自监督视觉表示学习的多模式在线知识蒸馏
Multi-Object Manipulation via Object-Centric Neural Scattering Functions 通过以对象为中心的神经散射函数进行多对象操作
Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning 通过迭代自定进度的监督对比学习进行多实例学习
Multiplicative Fourier Level of Detail 乘法傅立叶细节层次
Multi-Realism Image Compression With a Conditional Generator 使用条件生成器的多现实主义图像压缩
Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis 用于视图合成的多尺度张量分解和渲染方程编码
Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction 用于多视图 3D 重建的多传感器大规模数据集
Multi-Space Neural Radiance Fields 多空间神经辐射场
Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline 多光谱视频语义分割：基准数据集和基线
Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation 多变量、多频率和多模态：重新思考用于对话中情绪识别的图神经网络
Multi-View Adversarial Discriminator: Mine the Non-Causal Factors for Object Detection in Unseen Domains 多视图对抗鉴别器：挖掘不可见域中目标检测的非因果因素
Multi-View Azimuth Stereo via Tangent Space Consistency 通过切线空间一致性实现多视角方位立体
Multiview Compressive Coding for 3D Reconstruction 用于 3D 重建的多视图压缩编码
Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes 大型真实世界室内场景的多视图逆向渲染
Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF) 使用符号射线距离函数 (SRDF) 的多视图重建
Multi-View Stereo Representation Revist: Region-Aware MVSNet 多视图立体表示 Revist：区域感知 MVSNet
Music-Driven Group Choreography 音乐驱动的团体编舞
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video 基于互信息的时间差异学习用于视频中人体姿态估计
MVImgNet: A Large-Scale Dataset of Multi-View Images MVImgNet：多视图图像的大规模数据集
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training MV-JAR：用于基于 LiDAR 的自监督预训练的蒙面体素拼图和重建
NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory NaQ：利用旁白作为查询来监督情景记忆
NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction NAR-Former：面向整体属性预测的神经架构表示学习
Natural Language-Assisted Sign Language Recognition 自然语言辅助手语识别
NeAT: Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View Images NeAT：从多视图图像中学习具有任意拓扑结构的神经隐式表面
NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction From Multi-View Images NEF：用于从多视图图像重建 3D 参数曲线的神经边缘场
NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination NeFII：近场间接照明反射分解的逆向渲染
Neighborhood Attention Transformer 邻里注意变压器
NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of the Same Action NeMo：从同一动作的多个视频实例中学习 3D 神经运动场
NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors NeRDi：以语言引导扩散作为一般图像先验的单视图 NeRF 合成
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis 掌中的 NeRF：通过新视图合成对机器人进行矫正增强
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects NeRF-DS：动态镜面物体的神经辐射场
NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation NeRFInvertor：用于单次真实图像动画的高保真 NeRF-GAN 反演
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation From 2D Supervision Nerflets：用于从 2D 监督进行高效结构感知 3D 场景表示的局部辐射场
NeRFLight: Fast and Light Neural Radiance Fields Using a Shared Feature Grid NeRFLight：使用共享特征网格的快速轻神经辐射场
NeRFLix: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-Viewpoint MiXer NeRFLix：通过学习退化驱动的视点间混合器来合成高质量的神经视图
NeRF-RPN: A General Framework for Object Detection in NeRFs NeRF-RPN：NeRF 中对象检测的通用框架
NeRF-Supervised Deep Stereo NeRF 监督的深度立体声
NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds NeRFVS：通过几何支架进行自由视图合成的神经辐射场
NerVE: Neural Volumetric Edges for Parametric Curve Extraction From Point Cloud NerVE：用于从点云中提取参数化曲线的神经体积边缘
Network Expansion for Practical Training Acceleration 实践培训加速的网络扩展
Network-Free, Unsupervised Semantic Segmentation With Synthetic Images 无网络、无监督的合成图像语义分割
NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction NeuDA：用于高保真隐式表面重建的神经可变形锚
NeUDF: Leaning Neural Unsigned Distance Fields With Volume Rendering NeUDF：具有体积渲染的倾斜神经无符号距离场
NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images NeuFace：来自多视图图像的逼真 3D 神经人脸渲染
Neumann Network With Recursive Kernels for Single Image Defocus Deblurring 具有递归内核的 Neumann 网络用于单图像散焦去模糊
NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization NeuMap：通过自动转码器进行相机定位的神经坐标映射
Neural Congealing: Aligning Images to a Joint Semantic Atlas 神经凝固：将图像与联合语义图谱对齐
Neural Dependencies Emerging From Learning Massive Categories 学习大量类别中出现的神经依赖性
Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes 神经场满足城市场景逆向渲染的显式几何表示
Neural Fourier Filter Bank 神经傅立叶滤波器组
Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching 用于非刚性点云匹配的神经内在嵌入
Neural Kaleidoscopic Space Sculpting 神经万花筒空间雕刻
Neural Kernel Surface Reconstruction 神经核表面重建
Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition Neural Koopman Pooling：用于基于骨架的动作识别的受控制启发的时间动态编码
Neural Lens Modeling 神经透镜建模
Neural Map Prior for Autonomous Driving 用于自动驾驶的神经地图先验
Neural Part Priors: Learning To Optimize Part-Based Object Completion in RGB-D Scans 神经部分先验：学习优化 RGB-D 扫描中基于部分的对象完成
Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views 从多视图合成 3D-4D 视图的神经像素合成
Neural Preset for Color Style Transfer 颜色风格迁移的神经预设
Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Image Analytics in Split-DNN Models Split-DNN 模型中高效分布式图像分析的神经速率估计器和无监督学习
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos 流式自由视点视频的神经残余辐射场
Neural Scene Chronology 神经场景年表
Neural Texture Synthesis With Guided Correspondence 具有引导对应的神经纹理合成
Neural Transformation Fields for Arbitrary-Styled Font Generation 用于任意样式字体生成的神经转换场
Neural Vector Fields: Implicit Representation by Explicit Learning 神经向量场：通过显式学习进行隐式表示
Neural Video Compression With Diverse Contexts 具有不同上下文的神经视频压缩
Neural Volumetric Memory for Visual Locomotion Control 用于视觉运动控制的神经体积记忆
Neural Voting Field for Camera-Space 3D Hand Pose Estimation 相机空间 3D 手势估计的神经投票场
Neuralangelo: High-Fidelity Neural Surface Reconstruction Neuralangelo：高保真神经表面重建
NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions NeuralDome：多视图人机交互的神经建模管道
NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds NeuralEditor：通过操纵点云编辑神经辐射场
NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models NeuralField-LDM：使用分层潜在扩散模型生成场景
Neuralizer: General Neuroimage Analysis Without Re-Training Neuralizer：无需重新训练的一般神经图像分析
NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views NeuralLift-360：将野外 2D 照片提升为具有 360 度视图的 3D 对象
NeuralPCI: Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Linear Interpolation NeuralPCI：用于 3D 点云多帧非线性插值的时空神经场
NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies NeuralUDF：学习无符号距离场用于任意拓扑表面的多视图重建
NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization NeurOCS：用于单目 3D 对象定位的神经 NOCS 监督
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation 用于完全测试时间适应的神经调制赫布学习
Neuron Structure Modeling for Generalizable Remote Physiological Measurement 用于通用远程生理测量的神经元结构建模
NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation NeuWigs：用于体积头发捕捉和动画的神经动态模型
NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation NewsNet：一种用于分层时间分割的新型数据集
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars Next3D：用于 3D 感知头部头像的生成神经纹理光栅化
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution 用于高效轻量级图像超分辨率的 Swin Transformers 中的 N-Gram
NICO++: Towards Better Benchmarking for Domain Generalization NICO++：为域泛化建立更好的基准
NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging NIFF：通过神经实例特征锻造减轻广义少样本目标检测中的遗忘
Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior 使用光学中心对称先验去除夜间智能手机反射光斑
NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation NIKI：用于 3D 人体姿态和形状估计的可逆神经网络的神经逆运动学
NIPQ: Noise Proxy-Based Integrated Pseudo-Quantization NIPQ：基于噪声代理的集成伪量化
NIRVANA: Neural Implicit Representations of Videos With Adaptive Networks and Autoregressive Patch-Wise Modeling NIRVANA：具有自适应网络和自回归贴片建模的视频的神经隐式表示
NLOST: Non-Line-of-Sight Imaging With Transformer NLOST：使用变压器的非视线成像
No One Left Behind: Improving the Worst Categories in Long-Tailed Learning 没有人掉队：改进长尾学习中最差的类别
Noisy Correspondence Learning With Meta Similarity Correction 带元相似度校正的嘈杂对应学习
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers NoisyQuant：噪声偏差增强的视觉变换器训练后激活量化
NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs NoisyTwins：通过 StyleGAN 生成类别一致且多样化的图像
Non-Contrastive Learning Meets Language-Image Pre-Training 非对比学习遇上语言图像预训练
Non-Contrastive Unsupervised Learning of Physiological Signals From Video 从视频中非对比无监督地学习生理信号
Non-Line-of-Sight Imaging With Signal Superresolution Network 信号超分辨率网络的非视距成像
NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior NoPe-NeRF：在没有姿势先验的情况下优化神经辐射场
Normal-Guided Garment UV Prediction for Human Re-Texturing 用于人体再纹理的法线制导服装紫外线预测
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection 用于异常值感知对象检测的基于流的归一化特征合成
Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation 并非所有图像区域都很重要：用于自回归图像生成的掩蔽矢量量化
Novel Class Discovery for 3D Point Cloud Semantic Segmentation 用于 3D 点云语义分割的新型类发现
Novel-View Acoustic Synthesis 新型声学合成
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations NS3D：3D 对象和关系的神经符号基础
NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models 使用引导扩散模型编辑真实图像的空文本反演
NUWA-LIP: Language-Guided Image Inpainting With Defect-Free VQGAN NUWA-LIP：使用无缺陷 VQGAN 进行语言引导图像修复
NVTC: Nonlinear Vector Transform Coding NVTC：非线性矢量变换编码
Objaverse: A Universe of Annotated 3D Objects Objaverse：带注释的 3D 对象的宇宙
Object Detection With Self-Supervised Scene Adaptation 具有自监督场景适应的目标检测
Object Discovery From Motion-Guided Tokens 从运动引导令牌中发现对象
Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone? 对象弹出窗口：我们可以仅从人类交互中推断出 3D 对象及其姿势吗？
Object Pose Estimation With Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation 具有统计保证的物体姿态估计：共形关键点检测和几何不确定性传播
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection 用于开放词汇对象检测的对象感知蒸馏金字塔
Object-Goal Visual Navigation via Effective Exploration of Relations Among Historical Navigation States 通过有效探索历史导航状态之间的关系实现对象-目标视觉导航
ObjectMatch: Robust Registration Using Canonical Object Correspondences ObjectMatch：使用规范对象对应的稳健注册
ObjectStitch: Object Compositing With Diffusion Model ObjectStitch：使用扩散模型进行对象合成
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking 以观察为中心的 SORT：重新思考用于稳健多目标跟踪的 SORT
Occlusion-Free Scene Recovery via Neural Radiance Fields 通过神经辐射场恢复无遮挡场景
OCELOT: Overlapped Cell on Tissue Dataset for Histopathology OCELOT：用于组织病理学的组织数据集上的重叠细胞
OCTET: Object-Aware Counterfactual Explanations OCTET：对象感知反事实解释
OcTr: Octree-Based Transformer for 3D Object Detection OcTr：用于 3D 对象检测的基于八叉树的转换器
Octree Guided Unoriented Surface Reconstruction 八叉树引导的无向曲面重建
Olga Veksler 奥尔加维克斯勒
Omni Aggregation Networks for Lightweight Image Super-Resolution 用于轻量级图像超分辨率的 Omni 聚合网络
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild Omni3D：用于野外 3D 对象检测的大型基准和模型
OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization OmniAL：用于无监督异常定位的统一 CNN 框架
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis OmniAvatar：几何引导可控 3D 头部合成
OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images OmniCity：多层次多视图图像的万能城市理解
OmniMAE: Single Model Masked Pretraining on Images and Videos OmniMAE：图像和视频的单模型蒙面预训练
Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video Omnimatte3D：在无约束单眼视频中关联对象及其效果
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation OmniObject3D：用于真实感知、重建和生成的大词汇量 3D 对象数据集
OmniVidar: Omnidirectional Depth Estimation From Multi-Fisheye Images OmniVidar：多鱼眼图像的全向深度估计
On Calibrating Semantic Segmentation Models: Analyses and an Algorithm 关于校准语义分割模型：分析和算法
On Data Scaling in Masked Image Modeling 蒙版图像建模中的数据缩放
On Distillation of Guided Diffusion Models 关于引导扩散模型的蒸馏
On the Benefits of 3D Pose and Tracking for Human Action Recognition 关于 3D 姿势和跟踪对人类动作识别的好处
On the Convergence of IRLS and Its Variants in Outlier-Robust Estimation IRLS 及其变体在异常值鲁棒估计中的收敛性
On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer 关于不成对的红外到可见视频翻译的困难：细粒度内容丰富的补丁转移
On the Effectiveness of Partial Variance Reduction in Federated Learning With Heterogeneous Data 异构数据联邦学习中偏方差约简的有效性
On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering 关于自监督和对比对齐在深度多视图聚类中的影响
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks 关于精确几何数据对密集 3D 视觉任务的重要性
On the Pitfall of Mixup for Uncertainty Calibration 关于不确定性校准的混合陷阱
On the Stability-Plasticity Dilemma of Class-Incremental Learning 关于类增量学习的稳定性-可塑性困境
OneFormer: One Transformer To Rule Universal Image Segmentation OneFormer：一个统治通用图像分割的变压器
One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Field 具有可变形神经辐射场的一次性高保真说话头合成
One-Shot Model for Mixed-Precision Quantization 混合精度量化的一次性模型
One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer 使用组件感知转换器的单阶段 3D 全身网格恢复
One-to-Few Label Assignment for End-to-End Dense Detection 用于端到端密集检测的一对多标签分配
On-the-Fly Category Discovery 动态类别发现
Open Set Action Recognition via Multi-Label Evidential Learning 通过多标签证据学习进行开放集动作识别
Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning 使用补丁对齐对比学习的开放式词汇语义分割
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework 通过语言建模框架进行开放类人机交互预训练
OpenGait: Revisiting Gait Recognition Towards Better Practicality OpenGait：重新审视步态识别以获得更好的实用性
OpenMix: Exploring Outlier Samples for Misclassification Detection OpenMix：探索异常样本以进行错误分类检测
OpenScene: 3D Scene Understanding With Open Vocabularies OpenScene：使用开放词汇表理解 3D 场景
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator 通过提示视觉语言评估器进行开放集细粒度检索
Open-Set Likelihood Maximization for Few-Shot Learning 小样本学习的开放集似然最大化
Open-Set Representation Learning Through Combinatorial Embedding 通过组合嵌入进行开放集表示学习
Open-Set Semantic Segmentation for Point Clouds via Adversarial Prototype Framework 通过对抗性原型框架对点云进行开放式语义分割
Open-Vocabulary Attribute Detection 开放词汇属性检测
Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models 使用文本到图像扩散模型的开放词汇全景分割
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation 没有 3D 注释的开放词汇点云对象检测
Open-Vocabulary Semantic Segmentation With Mask-Adapted CLIP 使用 Mask-Adapted CLIP 的开放式词汇语义分割
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction 通过目标感知表示学习和自适应视野预测的开放世界多任务控制
OPE-SR: Orthogonal Position Encoding for Designing a Parameter-Free Upsampling Module in Arbitrary-Scale Image Super-Resolution OPE-SR：用于设计任意尺度图像超分辨率中无参数上采样模块的正交位置编码
Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection 可部署端到端行人检测的最优建议学习
Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting 最佳传输最小化：用于半监督计数的密度图上的人群定位
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing 用于压缩传感的受优化启发的交叉注意变换器
ORCa: Glossy Objects As Radiance-Field Cameras ORCa：作为辐射场相机的光滑物体
OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields OReX：使用神经场从平面横截面重建对象
OrienterNet: Visual Localization in 2D Public Maps With Neural Matching OrienterNet：使用神经匹配在 2D 公共地图中进行视觉定位
Orthogonal Annotation Benefits Barely-Supervised Medical Image Segmentation 正交注释有利于勉强监督的医学图像分割
OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation OSAN：统一多模态对齐和无监督域适应的单阶段对齐网络
OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer OSRT：具有失真感知转换器的全向图像超分辨率
OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering OTAvatar：具有可控三平面渲染的一次性说话人脸头像
OT-Filter: An Optimal Transport Filter for Learning With Noisy Labels OT-Filter：用于学习噪声标签的最佳传输滤波器
Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation 弱监督语义分割的候选外纠正
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning 用于稳健半监督学习的分布式语义修剪
OvarNet: Towards Open-Vocabulary Object Attribute Recognition OvarNet：走向开放词汇对象属性识别
Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction 克服 3D 手形重建中准确性和合理性之间的权衡
OVTrack: Open-Vocabulary Multiple Object Tracking OVTrack：开放词汇多对象跟踪
PA&DA: Jointly Sampling Path and Data for Consistent NAS PA&DA：为一致的 NAS 联合采样路径和数据
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers PaCa-ViT：在视觉转换器中学习补丁到集群的注意力
PACO: Parts and Attributes of Common Objects PACO：常见对象的部分和属性
Paint by Example: Exemplar-Based Image Editing With Diffusion Models 实例绘画：使用扩散模型进行基于范例的图像编辑
Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask 在 2D 中绘制 3D 自然：从单个语义掩码查看自然场景的合成
Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization 用于增强隐私保护视觉定位的配对点提升
PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields PaletteNeRF：基于调色板的神经辐射场外观编辑
PanelNet: Understanding 360 Indoor Environment via Panel Representation PanelNet：通过面板表示了解 360 度室内环境
PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters PAniC-3D：动漫人物肖像的程式化单视图 3D 重建
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg PanoHead：360 度的几何感知 3D 全头合成
Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels via Metric Learning 通过度量学习使用网络推断标签进行可编辑场景渲染的全景合成特征场
Panoptic Lifting for 3D Scene Understanding With Neural Fields 神经场 3D 场景理解的全景提升
Panoptic Video Scene Graph Generation 全景视频场景图生成
PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding PanoSwin：用于全景理解的 Pano-Style Swin Transformer
Parallel Diffusion Models of Operator and Image for Blind Inverse Problems 盲反问题算子和图像的并行扩散模型
Parameter Efficient Local Implicit Image Function Network for Face Segmentation 用于人脸分割的参数高效局部隐式图像函数网络
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment 用于音频驱动面部重现的参数化隐式面部表示
PartDistillation: Learning Parts From Instance Segmentation PartDistillation：从实例分割中学习零件
Partial Network Cloning 部分网络克隆
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy From Point Cloud Observations PartManip：从点云观察中学习跨类别可泛化零件操作策略
PartMix: Regularization Strategy To Learn Part Discovery for Visible-Infrared Person Re-Identification PartMix：用于学习可见红外人员重新识别的零件发现的正则化策略
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching Between Parts and Words Parts2Words：通过部分和单词之间的双向匹配来学习点云和文本的联合嵌入
PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models PartSLIP：通过预训练图像语言模型对 3D 点云进行低镜头零件分割
Passive Micron-Scale Time-of-Flight With Sunlight Interferometry 被动式微米级飞行时间与阳光干涉测量
Patch-Based 3D Natural Scene Generation From a Single Example 从单个示例生成基于补丁的 3D 自然场景
Patch-Craft Self-Supervised Training for Correlated Image Denoising 用于相关图像去噪的 Patch-Craft 自监督训练
Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective 用于无监督域适应的 Patch-Mix Transformer：游戏视角
PATS: Patch Area Transportation With Subdivision for Local Feature Matching PATS：具有局部特征匹配细分的补丁区域传输
PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction PC2：用于单图像 3D 重建的投影条件点云扩散
pCON: Polarimetric Coordinate Networks for Neural Scene Representations pCON：用于神经场景表示的极化坐标网络
PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning PCR：在线类增量持续学习的基于代理的对比重放
PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations PCT-Net：使用像素级颜色转换的全分辨率图像协调
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos PDPP：教学视频中程序规划的预计扩散
PD-Quant: Post-Training Quantization Based on Prediction Difference Metric PD-Quant：基于预测差异度量的训练后量化
PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation PeakConv：学习雷达语义分割的峰值感受野
PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration PEAL：用于低重叠点云配准的先验嵌入式显式注意学习
PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training PEFAT：通过伪损失估计和特征对抗训练促进半监督医学图像分类
Pengyu Li 李鹏宇
Perception and Semantic Aware Regularization for Sequential Confidence Calibration 顺序置信度校准的感知和语义感知正则化
Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation 使用最优目标估计的面向感知的单图像超分辨率
PermutoSDF: Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices PermutoSDF：使用 Permutohedral Lattices 的隐式曲面快速多视图重建
Persistent Nature: A Generative Model of Unbounded 3D Worlds 持久性：无限 3D 世界的生成模型
Person Image Synthesis via Denoising Diffusion Model 通过去噪扩散模型合成人物图像
PersonNeRF: Personalized Reconstruction From Photo Collections PersonNeRF：照片集的个性化重建
Perspective Fields for Single Image Camera Calibration 单图像相机校准的视角场
PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces PET-NeuS：神经表面的位置编码三平面
PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification PHA：用于基于 Transformer 的行人重新识别的逐块高频增强
Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection 相移编码器：预测定向物体检测中的准确方向
Phone2Proc: Bringing Robust Robots Into Our Chaotic World Phone2Proc：将强大的机器人带入我们混乱的世界
Photo Pre-Training, but for Sketch 照片预训练，但用于 Sketch
Physically Adversarial Infrared Patches With Learnable Shapes and Locations 具有可学习形状和位置的物理对抗红外补丁
Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling 物理上可实现的自然服装纹理通过 3D 建模躲避人体检测器
Physical-World Optical Adversarial Attacks on 3D Face Recognition 物理世界对 3D 人脸识别的光学对抗攻击
Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 用于从视频合成冲击声的物理驱动扩散模型
Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography 用于极端低光摄影的物理引导的 ISO 相关传感器噪声建模
Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval Pic2Word：将图片映射到单词以进行零样本合成图像检索
Picture That Sketch: Photorealistic Image Generation From Abstract Sketches Picture That Sketch：从抽象草图生成逼真的图像
PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers PIDNet：受 PID 控制器启发的实时语义分割网络
PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds PillarNeXt：重新思考 LiDAR 点云中 3D 对象检测的网络设计
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection PiMAE：用于 3D 对象检测的点云和图像交互式蒙版自动编码器
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification PIP-Net：用于可解释图像分类的基于补丁的直观原型
PIRLNav: Pretraining With Imitation and RL Finetuning for ObjectNav PIRLNav：ObjectNav 的模仿预训练和 RL 微调
PIVOT: Prompting for Video Continual Learning PIVOT：提示视频持续学习
PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization PivoTAL：弱监督时间动作定位的先验驱动监督
Pix2map: Cross-Modal Retrieval for Inferring Street Maps From Images Pix2map：从图像推断街道地图的跨模态检索
PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing PixHt-Lab：基于像素高度的图像合成光效生成
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding PLA：语言驱动的开放式词汇 3D 场景理解
PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes PlaneDepth：通过正交平面进行自监督深度估计
Planning-Oriented Autonomous Driving 面向规划的自动驾驶
Plateau-Reduced Differentiable Path Tracing Plateau-Reduced 可微分路径追踪
PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering PlenVDB：用于快速训练和渲染的内存高效的基于 VDB 的辐射场
PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation PLIKS：用于 3D 人体估计的伪线性逆运动学求解器
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation 用于文本驱动的图像到图像翻译的即插即用扩散功能
PMatch: Paired Masked Image Modeling for Dense Geometric Matching PMatch：用于密集几何匹配的成对蒙版图像建模
PMR: Prototypical Modal Rebalance for Multimodal Learning PMR：多模态学习的原型模态再平衡
POEM: Reconstructing Hand in a Point Embedded Multi-View Stereo POEM：在点嵌入多视图立体声中重建手
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting 点云预测作为 4D 占用预测的代理
Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields Point2Pix：通过神经辐射场进行逼真的点云渲染
PointAvatar: Deformable Point-Based Head Avatars From Videos PointAvatar：视频中基于点的可变形头像
PointCert: Point Cloud Classification With Deterministic Certified Robustness Guarantees PointCert：具有确定性认证稳健性保证的点云分类
PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering PointClustering：在聚类中使用变换不变性的无监督点云预训练
PointCMP: Contrastive Mask Prediction for Self-Supervised Learning on Point Cloud Videos PointCMP：点云视频自监督学习的对比蒙版预测
PointConvFormer: Revenge of the Point-Based Convolution PointConvFormer：基于点的卷积的报复
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection PointDistiller：结构化知识蒸馏以实现高效紧凑的 3D 检测
Pointersect: Neural Rendering With Cloud-Ray Intersection Pointersect：具有云射线相交的神经渲染
PointListNet: Deep Learning on 3D Point Lists PointListNet：3D 点列表的深度学习
PointVector: A Vector Representation in Point Cloud Analysis PointVector：点云分析中的矢量表示
Polarimetric iToF: Measuring High-Fidelity Depth Through Scattering Media 偏振 iToF：通过散射介质测量高保真深度
Polarized Color Image Denoising 偏振彩色图像去噪
Policy Adaptation From Foundation Model Feedback 根据基础模型反馈进行政策调整
PolyFormer: Referring Image Segmentation As Sequential Polygon Generation PolyFormer：将图像分割称为顺序多边形生成
Polynomial Implicit Neural Representations for Large Diverse Datasets 大型不同数据集的多项式隐式神经表示
Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once Poly-PC：一次用于多个点云任务的多面体网络
Pose Synchronization Under Multiple Pair-Wise Relative Poses 多对相对姿势下的姿势同步
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation 自监督面部表征的姿势解开对比学习
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation PoseExaminer：自动测试人体姿势和形状估计中的分布外鲁棒性
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation PoseFormerV2：探索频域以实现高效稳健的 3D 人体姿态估计
Position-Guided Text Prompt for Vision-Language Pre-Training 用于视觉语言预训练的位置引导文本提示
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation 用于图像和视频字幕评估的正增强对比学习
PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout PosterLayout：内容感知视觉文本演示布局的新基准和方法
Post-Processing Temporal Action Detection 后处理时间动作检测
Post-Training Quantization on Diffusion Models 扩散模型的训练后量化
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery POTTER：用于高效人体网格恢复的池化注意力转换器
Power Bundle Adjustment for Large-Scale 3D Reconstruction 大规模 3D 重建的功率束调整
Practical Network Acceleration With Tiny Sets Tiny Sets 的实用网络加速
Prefix Conditioning Unifies Language and Label Supervision 前缀条件统一语言和标签监督
PREIM3D: 3D Consistent Precise Image Attribute Editing From a Single Image PREIM3D：从单个图像编辑 3D 一致的精确图像属性
Preserving Linear Separability in Continual Learning by Backward Feature Projection 通过向后特征投影在持续学习中保持线性可分性
Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation 通用零样本分割的基元生成和语义相关对齐
Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions 恶劣天气条件下域增量语义分割的遗忘原理
PRISE: Demystifying Deep Lucas-Kanade With Strongly Star-Convex Constraints for Multimodel Image Alignment PRISE：用强星凸约束揭开 Deep Lucas-Kanade 的神秘面纱以进行多模型图像对齐
Privacy-Preserving Adversarial Facial Features 保护隐私的对抗性面部特征
Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses 保护隐私的表示是不够的：从相机姿势恢复场景内容
Private Image Generation With Dual-Purpose Auxiliary Classifier 具有两用辅助分类器的私有图像生成
PROB: Probabilistic Objectness for Open World Object Detection PROB：开放世界对象检测的概率对象性
Probabilistic Debiasing of Scene Graphs 场景图的概率去偏
Probabilistic Knowledge Distillation of Face Ensembles 人脸集合的概率知识蒸馏
Probabilistic Prompt Learning for Dense Prediction 密集预测的概率提示学习
Probability-Based Global Cross-Modal Upsampling for Pansharpening 用于全色锐化的基于概率的全局跨模态上采样
Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks 使用人工神经网络探索海马相关任务中场景感知的神经表征
Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism 探索受人类情感感知机制启发的面向情感的预训练
Procedure-Aware Pretraining for Instructional Video Understanding 用于教学视频理解的程序感知预训练
ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification ProD：Prompting-To-Disentangle 域知识用于跨域少样本图像分类
Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks 通过连接后门和对抗性攻击进行渐进式后门擦除
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis 用于细粒度可控说话头合成的渐进解缠表示学习
Progressive Neighbor Consistency Mining for Correspondence Pruning 用于对应修剪的渐进式邻居一致性挖掘
Progressive Open Space Expansion for Open-Set Model Attribution 开放集模型归因的渐进式开放空间扩展
Progressive Random Convolutions for Single Domain Generalization 单域泛化的渐进随机卷积
Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning 广义零样本学习的渐进式语义-视觉相互适应
Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation 用于高效的基于事件的运动估计的渐进式时空对齐
Progressive Transformation Learning for Leveraging Virtual Images in Training 在训练中利用虚拟图像的渐进式转换学习
Progressively Optimized Local Radiance Fields for Robust View Synthesis 渐进优化的局部辐射场，用于稳健的视图合成
Promoting Semantic Connectivity: Dual Nearest Neighbors Contrastive Learning for Unsupervised Domain Generalization 促进语义连通性：无监督域泛化的双最近邻对比学习
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery PromptCAL：通过辅助提示进行对比亲和学习以发现广义的小说类别
Prompt-Guided Zero-Shot Anomaly Action Recognition Using Pretrained Deep Skeleton Features 使用预训练深度骨架特征的提示引导零样本异常动作识别
Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering 提示大型语言模型与答案启发式知识为基础的视觉问答
Propagate and Calibrate: Real-Time Passive Non-Line-of-Sight Tracking 传播和校准：实时被动非视距跟踪
ProphNet: Efficient Agent-Centric Motion Forecasting With Anchor-Informed Proposals ProphNet：高效的以代理为中心的运动预测与锚通知建议
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization 用于弱监督时间动作定位的基于建议的多实例学习
ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding ProTeGe：视频时间接地的未修剪预训练视频时间接地
ProtoCon: Pseudo-Label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-Supervised Learning ProtoCon：通过在线聚类和原型一致性改进伪标签以实现高效的半监督学习
Prototype-Based Embedding Network for Scene Graph Generation 用于场景图生成的基于原型的嵌入网络
Prototypical Residual Networks for Anomaly Detection and Localization 用于异常检测和定位的原型残留网络
Proximal Splitting Adversarial Attack for Semantic Segmentation 用于语义分割的近端分裂对抗攻击
ProxyFormer: Proxy Alignment Assisted Point Cloud Completion With Missing Part Sensitive Transformer ProxyFormer：代理对齐辅助点云补全缺失部分敏感变压器
Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge 通过双层优化修剪参数化以实现边缘上的高效语义分割
Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation 用于半监督医学图像分割的伪标签引导对比学习
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers PSVT：使用渐进式视频转换器进行端到端多人 3D 姿势和形状估计
Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes 把人放在他们的位置：可供感知的人类插入场景
PVO: Panoptic Visual Odometry PVO：全景视觉里程计
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer PVT-SSD：带点体素变换器的单级 3D 物体检测器
PyPose: A Library for Robot Learning With Physics-Based Optimization PyPose：基于物理优化的机器人学习库
PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow PyramidFlow：使用 Pyramid Normalizing Flow 的高分辨率缺陷对比定位
Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! 问：如何将大型视觉语言模型专门用于数据稀缺的 VQA 任务？ A：在未标记图像上进行自我训练！
Q-DETR: An Efficient Low-Bit Quantized Detection Transformer Q-DETR：一种高效的低位量化检测变压器
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation QPGesture：用于自然语音驱动手势生成的基于量化和相位引导的运动匹配
Quality-Aware Pre-Trained Models for Blind Image Quality Assessment 用于盲图像质量评估的质量感知预训练模型
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity QuantArt：量化图像风格迁移以实现高视觉保真度
Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis 自定义属性对 3D 感知图像合成的定量操作
Quantum Multi-Model Fitting 量子多模型拟合
Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification 用于高光谱图像分类的受量子启发的光谱空间金字塔网络
Query-Centric Trajectory Prediction 以查询为中心的轨迹预测
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection 用于时刻检索和亮点检测的查询相关视频表示
R2Former: Unified Retrieval and Reranking Transformer for Place Recognition R2Former：用于地点识别的统一检索和重新排序转换器
RaBit: Parametric Modeling of 3D Biped Cartoon Characters With a Topological-Consistent Dataset RaBit：使用拓扑一致数据集对 3D 双足卡通人物进行参数化建模
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training RA-CLIP：检索增强对比语言图像预训练
Randomized Adversarial Training via Taylor Expansion 通过泰勒展开进行随机对抗训练
Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation 具有聚焦运动估计的距离零空间视频帧插值
RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving RangeViT：面向自动驾驶 3D 语义分割的视觉转换器
Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate 关键稀有类的排名正则化：以高真阳性率最小化误报
RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories RankMix：用于对具有不同大小和不平衡类别的整个幻灯片图像进行分类的弱监督学习的数据增强
Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks 速率梯度近似攻击威胁深度尖峰神经网络
Raw Image Reconstruction With Learned Compact Metadata 使用学习的紧凑元数据重建原始图像
Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments Rawgment：噪声计入 RAW 增强可在各种环境中进行识别
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization Re2TAL：重新连接预训练视频主干以实现可逆时间动作定位
RealFusion: 360deg Reconstruction of Any Object From a Single Image RealFusion：根据单个图像对任何对象进行 360 度重建
RealImpact: A Dataset of Impact Sound Fields for Real Objects RealImpact：真实物体撞击声场数据集
Realistic Saliency Guided Image Enhancement 现实显着性引导图像增强
Real-Time 6K Image Rescaling With Rate-Distortion Optimization 具有率失真优化的实时 6K 图像重新缩放
Real-Time Controllable Denoising for Image and Video 图像和视频的实时可控去噪
Real-Time Evaluation in Online Continual Learning: A New Hope 在线持续学习中的实时评估：新希望
Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video 针对未修剪视频的野外实时多人眨眼检测
Real-Time Neural Light Field on Mobile Devices 移动设备上的实时神经光场
ReasonNet: End-to-End Driving With Temporal and Global Reasoning ReasonNet：使用时间和全局推理进行端到端驱动
Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning 为基于样本的类增量学习重新平衡批归一化
Re-Basin via Implicit Sinkhorn Differentiation 通过隐式 Sinkhorn 分化重新盆地
REC-MV: REconstructing 3D Dynamic Cloth From Monocular Videos REC-MV：从单眼视频重建 3D 动态布料
ReCo: Region-Controlled Text-to-Image Generation ReCo：区域控制的文本到图像生成
Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation 用于极低分辨率人脸识别和质量估计的可识别性嵌入增强
Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants With No False Negatives and No False Positives 通过没有假阴性和假阳性的完整连续等距不变量识别未标记点云的刚性模式
Reconstructing Animatable Categories From Videos 从视频中重建动画类别
Reconstructing Signing Avatars From Video Using Linguistic Priors 使用语言先验从视频重建签名头像
Recovering 3D Hand Mesh Sequence From a Single Blurry Image: A New Dataset and Temporal Unfolding 从单个模糊图像中恢复 3D 手部网格序列：一个新的数据集和时间展开
Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models Recurrence Without Recurrence：使用深度平衡模型进行稳定的视频地标检测
Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer 使用单应性引导图像变形和聚焦变换器的循环单应性估计
Recurrent Vision Transformers for Object Detection With Event Cameras 用于使用事件相机进行对象检测的递归视觉变换器
ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection ReDirTrans：注视和头部重定向的潜在到潜在翻译
Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation 减少时间戳监督时间动作分割的标签偏差
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension RefCLIP：弱监督指称表达理解的通用教师
Referring Image Matting 参考图像抠图
Referring Multi-Object Tracking 参考多目标跟踪
Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization Ref-NPR：用于可控场景风格化的基于参考的非真实感辐射场
RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis RefSR-NeRF：迈向高保真和超分辨率视图合成
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension RefTeacher：半监督参考表达理解的强大基线
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration Re-GAN：通过架构重构进行数据高效的 GAN 训练
Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers 使用 Vision Transformers 进行开放词汇对象检测的区域感知预训练
Regularization of Polynomial Networks for Image Recognition 用于图像识别的多项式网络正则化
Regularize Implicit Neural Representation by Itself 自行规范隐式神经表征
Regularized Vector Quantization for Tokenized Image Synthesis 用于标记化图像合成的正则化矢量量化
Regularizing Second-Order Influences for Continual Learning 规范二阶影响以持续学习
Reinforcement Learning-Based Black-Box Model Inversion Attacks 基于强化学习的黑盒模型反转攻击
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild Re-IQA：野外图像质量评估的无监督学习
Relational Context Learning for Human-Object Interaction Detection 用于人-物交互检测的关系上下文学习
Relational Space-Time Query in Long-Form Videos 长视频中的关系时空查询
Reliability in Semantic Segmentation: Are We on the Right Track? 语义分割的可靠性：我们走在正确的轨道上吗？
Reliable and Interpretable Personalized Federated Learning 可靠且可解释的个性化联邦学习
ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects ReLight My NeRF：用于现实世界对象的新颖视图合成和重新照明的数据集
Relightable Neural Human Assets From Multi-View Gradient Illuminations 来自多视图渐变照明的可照明神经人类资产
RelightableHands: Efficient Neural Relighting of Articulated Hand Models RelightableHands：铰接式手部模型的高效神经重新照明
Removing Objects From Neural Radiance Fields 从神经辐射场中移除对象
Renderable Neural Radiance Map for Visual Navigation 用于视觉导航的可渲染神经辐射图
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation RenderDiffusion：用于 3D 重建、修复和生成的图像扩散
RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction RepMode：学习重新参数化亚细胞结构预测的不同专家
Representation Learning for Visual Object Tracking by Masked Appearance Transfer 通过 Masked Appearance Transfer 进行视觉对象跟踪的表征学习
Representing Volumetric Videos As Dynamic MLP Maps 将体积视频表示为动态 MLP 映射
Reproducible Scaling Laws for Contrastive Language-Image Learning 对比语言图像学习的可重现比例定律
ResFormer: Scaling ViTs With Multi-Resolution Training ResFormer：通过多分辨率训练扩展 ViT
Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging 用于压缩光谱成像的跨光谱和空间混合先验的残余退化学习展开框架
Resource-Efficient RGBD Aerial Tracking 资源高效的 RGBD 航拍
Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With Degradation Generator 使用带退化生成器的潜在空间映射修复手绘建筑图
Rethinking Domain Generalization for Face Anti-Spoofing: Separability and Alignment 重新思考人脸反欺骗的域泛化：可分离性和对齐
Rethinking Feature-Based Knowledge Distillation for Face Recognition 重新思考基于特征的人脸识别知识蒸馏
Re-Thinking Federated Active Learning Based on Inter-Class Diversity 重新思考基于类间多样性的联邦主动学习
Rethinking Federated Learning With Domain Shift: A Prototype View 用域转移重新思考联邦学习：原型视图
Rethinking Few-Shot Medical Segmentation: A Vector Quantization View 重新思考小样本医学分割：矢量量化视图
Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling 重新思考梯度投影持续学习：稳定性/可塑性特征空间解耦
Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective 从长尾分布学习的角度重新思考图像超分辨率
Re-Thinking Model Inversion Attacks Against Deep Neural Networks 重新思考针对深度神经网络的模型反转攻击
Rethinking Optical Flow From Geometric Matching Consistent Perspective 从几何匹配一致的角度重新思考光流
Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need 重新思考分布外 (OOD) 检测：蒙版图像建模就是您所需要的
Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation 重新思考点云法线估计的 3D 曲面拟合中的近似误差
Rethinking the Correlation in Few-Shot Segmentation: A Buoys View 重新思考小样本分割中的相关性：浮标视图
Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition 重新思考动态面部表情识别的学习范式
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning 重新思考视频 ViT：用于联合图像和视频学习的稀疏视频管
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory 揭示：具有多源多模式知识记忆的检索增强视觉语言预训练
Revealing the Dark Secrets of Masked Image Modeling 揭示蒙版图像建模的黑暗秘密
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration ReVISE：具有视觉输入的自监督语音再合成，用于通用和通用语音再生
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens 重温对比学习中的多模态表示：从补丁和令牌嵌入到有限离散令牌
Revisiting Prototypical Network for Cross Domain Few-Shot Learning 重温跨域小样本学习的原型网络
Revisiting Residual Networks for Adversarial Robustness 重新审视剩余网络的对抗性鲁棒性
Revisiting Reverse Distillation for Anomaly Detection 重新审视异常检测的逆蒸馏
Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution 重新审视滚动快门束调整：走向准确和快速的解决方案
Revisiting Rotation Averaging: Uncertainties and Robust Losses 重新审视轮换平均：不确定性和稳健损失
Revisiting Self-Similarity: Structural Embedding for Image Retrieval 重温自相似性：图像检索的结构嵌入
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring 重新审视基于 CLIP 的图像到视频知识传输的时间建模
Revisiting the P3P Problem 重温 P3P 问题
Revisiting the Stack-Based Inverse Tone Mapping 重温基于堆栈的反向色调映射
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation 重温半监督语义分割中的弱到强一致性
RGB No More: Minimally-Decoded JPEG Vision Transformers RGB 不再：最小解码的 JPEG 视觉转换器
RGBD2: Generative Scene Synthesis via Incremental View Inpainting Using RGBD Diffusion Models RGBD2：使用 RGBD 扩散模型通过增量视图修复生成场景合成
RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts RIATIG：具有自然提示的可靠且不易察觉的对抗性文本到图像生成
RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo RIAV-MVS：循环索引多视图立体的不对称体积
RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors RIDCP：通过高质量码本先验使真实图像去雾焕发活力
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor RiDDLE：使用 Latent Encryptor 进行可逆和多样化的去标识化
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer RIFormer：保持你的愿景骨干有效但移除代币混合器
Rigidity-Aware Detection for 6D Object Pose Estimation 用于 6D 物体姿态估计的刚性感知检测
RILS: Masked Visual Reconstruction in Language Semantic Space RILS：语言语义空间中的掩蔽视觉重构
RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases RMLVQA：一种用于带有语言偏差的视觉问答的边缘损失方法
Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation From Image Sequence 机器人结构先验引导时间注意，用于从图像序列估计相机到机器人的姿态
Robust 3D Shape Classification via Non-Local Graph Attention Network 通过非局部图注意力网络进行稳健的 3D 形状分类
Robust and Scalable Gaussian Process Regression and Its Applications 稳健可扩展的高斯过程回归及其应用
Robust Dynamic Radiance Fields 强大的动态辐射场
Robust Generalization Against Photon-Limited Corruptions via Worst-Case Sharpness Minimization 通过最坏情况锐度最小化对光子受限损坏进行稳健泛化
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation 持续和渐进的测试时间适应的强大平均教师
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation 通过弱监督异常值分割进行基于模型的稳健人脸重建
Robust Multiview Point Cloud Registration With Reliable Pose Graph Initialization and History Reweighting 具有可靠姿态图初始化和历史重新加权的稳健多视图点云注册
Robust Outlier Rejection for 3D Registration With Variational Bayes 使用变分贝叶斯进行 3D 配准的稳健异常值拒绝
Robust Single Image Reflection Removal Against Adversarial Attacks 针对对抗性攻击的鲁棒单图像反射去除
Robust Test-Time Adaptation in Dynamic Scenarios 动态场景中的鲁棒测试时间适应
Robust Unsupervised StyleGAN Image Restoration Robust Unsupervised StyleGAN 图像恢复
RobustNeRF: Ignoring Distractors With Robust Losses RobustNeRF：忽略具有强大损失的干扰因素
RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion RODIN：使用扩散雕刻 3D 数字化身的生成模型
Role of Transients in Two-Bounce Non-Line-of-Sight Imaging 瞬变在二次反射非视线成像中的作用
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval RONO：用于 2D-3D 跨模态检索的带有噪声标签的鲁棒判别学习
Rotation-Invariant Transformer for Point Cloud Matching 用于点云匹配的旋转不变变换器
Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks 跑步，不要走路：为更快的神经网络追求更高的 FLOPS
Runzhao Yang 杨润兆
RUST: Latent Neural Scene Representations From Unposed Imagery RUST：来自未定图像的潜在神经场景表示
RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security Image Synthesis RWSC-Fusion：用于禁止 X 射线安全图像合成的区域智能风格控制融合网络
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning S3C：通过自我批判学习的半监督 VQA 自然语言解释
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation SadTalker：为程式化音频驱动的单图像说话人脸动画学习逼真的 3D 运动系数
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models 安全潜扩散：减轻扩散模型中的不适当退化
Sample-Level Multi-View Graph Clustering 样本级多视图图聚类
Samples With Low Loss Curvature Improve Data Efficiency 具有低损耗曲率的样本可提高数据效率
Sampling Is Matter: Point-Guided 3D Human Mesh Reconstruction 采样很重要：点引导 3D 人体网格重建
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency SAP-DETR：弥合显着点和基于查询的 Transformer 检测器之间的差距以实现快速模型收敛
Satoshi Ikehata 池端聪
SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates SCADE：来自具有歧义感知深度估计的空间雕刻的 NeRF
Scalable, Detailed and Mask-Free Universal Photometric Stereo 可缩放、细节丰富且无遮罩的通用光度立体
ScaleDet: A Scalable Multi-Dataset Object Detector ScaleDet：可扩展的多数据集对象检测器
ScaleFL: Resource-Adaptive Federated Learning With Heterogeneous Clients ScaleFL：异构客户端的资源自适应联邦学习
ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector ScaleKD：在小物体检测器中提炼尺度感知知识
Scaling Language-Image Pre-Training via Masking 通过掩蔽扩展语言图像预训练
Scaling Up GANs for Text-to-Image Synthesis 放大 GAN 以进行文本到图像的合成
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images ScanDMM：360 度图像扫描路径预测的深度马尔可夫模型
ScarceNet: Animal Pose Estimation With Scarce Annotations ScarceNet：带有稀缺注释的动物姿势估计
SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy SCConv：特征冗余的空间和通道重建卷积
Scene-Aware Egocentric 3D Human Pose Estimation 场景感知自我中心 3D 人体姿态估计
SceneComposer: Any-Level Semantic Image Synthesis SceneComposer：任意级别的语义图像合成
SceneTrilogy: On Human Scene-Sketch and Its Complementarity With Photo and Text 场景三部曲：论人物场景写生及其与图文互补
SCoDA: Domain Adaptive Shape Completion for Real Scans SCoDA：真实扫描的域自适应形状完成
SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow SCOOP：自监督对应和基于优化的场景流
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation Score Jacobian Chaining：为 3D 生成提升预训练的 2D 扩散模型
SCOTCH and SODA: A Transformer Video Shadow Detection Framework SCOTCH 和 SODA：Transformer 视频阴影检测框架
SCPNet: Semantic Scene Completion on Point Cloud SCPNet：点云上的语义场景补全
SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation SDC-UDA：用于切片方向连续交叉模态医学图像分割的体积无监督域自适应框架
Search-Map-Search: A Frame Selection Paradigm for Action Recognition Search-Map-Search：动作识别的框架选择范式
Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts 对对抗性和自然分布变化具有稳健性的调味模型汤
SeaThru-NeRF: Neural Radiance Fields in Scattering Media SeaThru-NeRF：散射介质中的神经辐射场
SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations SECAD-Net：通过学习草图拉伸操作进行自我监督的 CAD 重建
Seeing a Rose in Five Thousand Ways 五千种方式看玫瑰
Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding 超越大脑：用于视觉解码的具有稀疏掩蔽建模的条件扩散模型
Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container 透过玻璃看：透明容器内物体的神经 3D 重建
Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning Seeing What You Miss：视觉-语言预训练与语义补全学习
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert Seeing What You Sayed：由唇读专家指导的会说话的面孔生成
Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding 看声音：用于多模式场景理解的远程声学波束形成
SegLoc: Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization SegLoc：学习基于分割的隐私保护视觉本地化表示
Selective Structured State-Spaces for Long-Form Video Understanding 用于长视频理解的选择性结构化状态空间
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation 用于通用人体姿态估计的可自校正和适应性推理
Self-Guided Diffusion Models 自导扩散模型
SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition SelfME：用于微表情识别的自监督运动学习
Self-Positioning Point-Based Transformer for Point Cloud Understanding 用于点云理解的自定位基于点的转换器
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints Superpoints 引导的自监督 3D 场景流估计
Self-Supervised AutoFlow 自监督自动流
Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization 具有深度期望最大化的自监督盲运动去模糊
Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion 用于基于样式的 3D GAN 反演的自监督几何感知编码器
Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss 通过语义容忍对比损失进行自监督图像到点蒸馏
Self-Supervised Implicit Glyph Attention for Text Recognition 用于文本识别的自监督隐式字形注意
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching 多模态非刚性 3D 形状匹配的自监督学习
Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture 使用联合嵌入预测架构从图像中进行自我监督学习
Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring 用于盲图像去模糊的基于流的运动先验自监督非均匀核估计
Self-Supervised Pre-Training With Masked Shape Prediction for 3D Scene Understanding 用于 3D 场景理解的带掩码形状预测的自监督预训练
Self-Supervised Representation Learning for CAD CAD 的自监督表示学习
Self-Supervised Super-Plane for Neural 3D Reconstruction 用于神经 3D 重建的自监督超平面
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection 通过视听异常检测进行自监督视频取证
Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains 通过多个标签域上的可扩展语义传输进行语义人体解析
Semantic Prompt for Few-Shot Image Recognition 小样本图像识别的语义提示
Semantic Ray: Learning a Generalizable Semantic Field With Cross-Reprojection Attention 语义射线：通过交叉重投影注意力学习可泛化的语义场
Semantic Scene Completion With Cleaner Self 具有清洁自我的语义场景完成
Semantic-Conditional Diffusion Networks for Image Captioning 用于图像描述的语义条件扩散网络
Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation 零样本实例分割的语义促进去偏和背景消歧
SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation SemiCVT：用于语义分割的半监督卷积视觉转换器
Semidefinite Relaxations for Robust Multiview Triangulation 鲁棒多视点三角剖分的半定松弛
Semi-DETR: Semi-Supervised Object Detection With Detection Transformers Semi-DETR：使用检测变压器的半监督目标检测
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module 位置不一致伪标签校正模块驱动的半监督二维人体姿态估计
Semi-Supervised Domain Adaptation With Source Label Adaptation 具有源标签自适应的半监督域自适应
Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination 通过结构解缠结和双重对抗歧视进行半监督手部外观恢复
Semi-Supervised Learning Made Simple With Self-Supervised Clustering 自监督聚类使半监督学习变得简单
Semi-Supervised Parametric Real-World Image Harmonization 半监督参数化真实世界图像协调
Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus 基于跨视图共识的半监督立体 3D 对象检测
Semi-Supervised Video Inpainting With Cycle Consistency Constraints 具有循环一致性约束的半监督视频修复
Semi-Weakly Supervised Object Kinematic Motion Prediction 半弱监督对象运动学运动预测
SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence SE-ORNet：用于无监督点云形状对应的自集成方向感知网络
SeqTrack: Sequence to Sequence Learning for Visual Object Tracking SeqTrack：用于视觉对象跟踪的序列到序列学习
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated “Knowledge Gaps” Present Among Independently Trained GAN Instances 针对 GAN 分类器的 GAN 序列训练揭示了独立训练的 GAN 实例中存在的相关“知识差距”
SeSDF: Self-Evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction SeSDF：用于隐式 3D 穿衣人体重建的自进化符号距离场
SFD2: Semantic-Guided Feature Detection and Description SFD2：语义引导特征检测和描述
SfM-TTR: Using Structure From Motion for Test-Time Refinement of Single-View Depth Networks SfM-TTR：使用运动结构进行单视图深度网络的测试时间细化
SGLoc: Scene Geometry Encoding for Outdoor LiDAR Localization SGLoc：用于室外 LiDAR 定位的场景几何编码
ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal ShadowDiffusion：当退化先验遇到阴影去除的扩散模型时
ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision ShadowNeuS：Shadow Ray 监督的神经 SDF 重建
Shakes on a Plane: Unsupervised Depth Estimation From Unstabilized Photography 平面上的抖动：来自不稳定摄影的无监督深度估计
Shape-Aware Text-Driven Layered Video Editing 形状感知文本驱动的分层视频编辑
ShapeClipper: Scalable 3D Shape Learning From Single-View Images via Geometric and CLIP-Based Consistency ShapeClipper：通过基于几何和 CLIP 的一致性从单视图图像中学习可扩展的 3D 形状
Shape-Constraint Recurrent Flow for 6D Object Pose Estimation 用于 6D 物体姿态估计的形状约束循环流
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification 用于可见红外行人再识别的形状擦除特征学习
ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations ShapeTalk：用于 3D 形状编辑和变形的语言数据集和框架
Sharpness-Aware Gradient Matching for Domain Generalization 用于域泛化的清晰度感知梯度匹配
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning 引导槽到对象：走向稳定和强大的以对象为中心的学习
Shifted Diffusion for Text-to-Image Generation 用于文本到图像生成的移位扩散
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations 深度神经网络解释评估中基于自上而下随机化完整性检查的缺点
SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds SHS-Net：学习用于点云定向法线估计的带符号超曲面
Siamese DETR 暹罗DETR
Siamese Image Modeling for Self-Supervised Vision Representation Learning 用于自监督视觉表示学习的连体图像建模
Sibling-Attack: Rethinking Transferable Adversarial Attacks Against Face Recognition 兄弟攻击：重新思考针对人脸识别的可转移对抗攻击
Side Adapter Network for Open-Vocabulary Semantic Segmentation 用于开放词汇语义分割的侧适配器网络
SIEDOB: Semantic Image Editing by Disentangling Object and Background SIEDOB：通过分离对象和背景进行语义图像编辑
SIM: Semantic-Aware Instance Mask Generation for Box-Supervised Instance Segmentation SIM：用于框监督实例分割的语义感知实例掩码生成
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding 自训练弱监督短语基础的相似性图
Similarity Metric Learning for RGB-Infrared Group Re-Identification 用于 RGB 红外组重新识别的相似性度量学习
Simple Cues Lead to a Strong Multi-Object Tracker 简单的线索导致强大的多目标跟踪器
SimpleNet: A Simple Network for Image Anomaly Detection and Localization SimpleNet：用于图像异常检测和定位的简单网络
SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network SimpSON：通过单击分散对象分割网络简化照片清理
Simulated Annealing in Early Layers Leads to Better Generalization 早期层的模拟退火导致更好的泛化
Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation 半监督视频语义分割的同时短期和长期时间建模
SINE: Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Field SINE：具有先验引导编辑字段的语义驱动的基于图像的 NeRF 编辑
SINE: SINgle Image Editing With Text-to-Image Diffusion Models SINE：使用文本到图像扩散模型的单图像编辑
Single Domain Generalization for LiDAR Semantic Segmentation LiDAR 语义分割的单域泛化
Single Image Backdoor Inversion via Robust Smoothed Classifiers 通过稳健的平滑分类器进行单图像后门反演
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take 单幅图像深度预测变得更好：多变量高斯取值
Single View Scene Scale Estimation Using Scale Field 使用比例场的单视图场景比例估计
SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene SinGRAF：学习单个场景的 3D 生成辐射场
Sketch2Saliency: Learning To Detect Salient Objects From Human Drawings Sketch2Saliency：学习从人体绘图中检测显着对象
SketchXAI: A First Look at Explainability for Human Sketches SketchXAI：初步了解人体草图的可解释性
Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry 蒙皮运动重定向与运动语义和几何的残余感知
SkyEye: Self-Supervised Bird’s-Eye-View Semantic Mapping Using Monocular Frontal View Images SkyEye：使用单目正面视图图像的自监督鸟瞰图语义映射
SLACK: Stable Learning of Augmentations With Cold-Start and KL Regularization SLACK：通过冷启动和 KL 正则化稳定学习增强
Sliced Optimal Partial Transport 切片最优部分传输
SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation SliceMatch：用于交叉视图姿态估计的几何引导聚合
Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention Slide-Transformer：具有局部自注意力的分层视觉转换器
Slimmable Dataset Condensation 可精简的数据集压缩
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments SLOPER4D：用于城市环境中全局 4D 人体姿态估计的场景感知数据集
SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples SlowLiDAR：使用对抗样本增加基于 LiDAR 的检测的延迟
SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders SMAE：使用饱和度感知蒙版自动编码器进行 HDR 去鬼影的小样本学习
SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation SmallCap：通过检索增强提示的轻量级图像字幕
SmartAssign: Learning a Smart Knowledge Assignment Strategy for Deraining and Desnowing SmartAssign：学习用于除雨和除雪的智能知识分配策略
SmartBrush: Text and Shape Guided Object Inpainting With Diffusion Model SmartBrush：使用扩散模型的文本和形状引导对象修复
SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation SMOC-Net：利用相机姿态进行自监督单目物体姿态估计
SMPConv: Self-Moving Point Representations for Continuous Convolution SMPConv：连续卷积的自移动点表示
Soft Augmentation for Image Classification 图像分类的软增强
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks 缓解时间动作定位任务中任务差异问题的软着陆策略
Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models 使用预训练的 2D 扩散模型解决 3D 逆问题
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective 从理论角度解决训练后量化中的振荡问题
Solving Relaxations of MAP-MRF Problems: Combinatorial In-Face Frank-Wolfe Directions 解决 MAP-MRF 问题的松弛：面对面 Frank-Wolfe 方向的组合
SOOD: Towards Semi-Supervised Oriented Object Detection SOOD：走向半监督的定向目标检测
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment 通过音频到视觉潜在对齐生成声音到视觉场景
Source-Free Adaptive Gaze Estimation by Uncertainty Reduction 通过减少不确定性进行无源自适应注视估计
Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning 具有时空历史一致性学习的无源视频域自适应
SPARF: Neural Radiance Fields From Sparse and Noisy Poses SPARF：来自稀疏和嘈杂姿势的神经辐射场
Sparse Multi-Modal Graph Transformer With Shared-Context Processing for Representation Learning of Giga-Pixel Images 具有共享上下文处理的稀疏多模态图形变换器，用于千兆像素图像的表示学习
SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction SparseFusion：为 3D 重建提取视图条件扩散
Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures 使用自适应高斯混合的稀疏注释语义分割
SparsePose: Sparse-View Camera Pose Regression and Refinement SparsePose：稀疏视角相机姿势回归和细化
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer SparseViT：重新审视高效高分辨率视觉转换器的激活稀疏性
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers Sparsifiner：为高效视觉转换器学习稀疏实例相关注意力
SpaText: Spatio-Textual Representation for Controllable Image Generation SpaText：用于可控图像生成的空间文本表示
Spatial-Frequency Mutual Learning for Face Super-Resolution 人脸超分辨率的空间频率互学习
Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising 真实世界图像去噪的空间自适应自监督学习
Spatial-Temporal Concept Based Explanation of 3D ConvNets 基于时空概念的 3D ConvNets 解释
Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence 视频通信的时空自监督学习
Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Image 双像素图像的空间焦点双向视差估计
Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic Segmentation 基于时空像素级对比学习的无源域自适应视频语义分割
Spatiotemporal Self-Supervised Learning for Point Clouds in the Wild 野外点云的时空自监督学习
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style 专家扩散：文本到图像扩散模型的即插即用采样高效微调以学习任何看不见的风格
Spectral Bayesian Uncertainty for Image Super-Resolution 图像超分辨率的光谱贝叶斯不确定性
Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising 用于高光谱图像去噪的光谱增强矩形变换器
Sphere-Guided Training of Neural Implicit Surfaces 神经隐式曲面的球面引导训练
Spherical Transformer for LiDAR-Based 3D Recognition 用于基于 LiDAR 的 3D 识别的球形变压器
Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training Spider GAN：利用友好邻居加速 GAN 训练
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields SPIn-NeRF：多视图分割和神经辐射场的感知修复
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries SplineCam：深度网络几何和决策边界的精确可视化和表征
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo Spring：场景流、光流和立体声的高分辨率高细节数据集和基准
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection SQUID：用于无监督异常检测的深层特征内画
sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model sRGB 真实噪声与相邻相关感知噪声模型的合成
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking 站在过去和未来之间：多相机 3D 多目标跟踪的时空建模
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection STAR Loss：减少面部地标检测中的语义歧义
StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments StarCraftImage：多智能体环境空间推理方法原型数据集
Stare at What You See: Masked Image Modeling Without Reconstruction 凝视所见：无需重建的蒙版图像建模
Starting From Non-Parametric Networks for 3D Point Cloud Analysis 从非参数网络开始进行 3D 点云分析
STDLens: Model Hijacking-Resilient Federated Learning for Object Detection STDLens：用于对象检测的模型劫持-弹性联合学习
SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory SteerNeRF：通过平滑视点轨迹加速 NeRF 渲染
StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos StepFormer：教学视频中的自我监督步骤发现和定位
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction 刺激验证是多模态人体轨迹预测中通用且有效的采样器
Stitchable Neural Networks 可缝合神经网络
STMixer: A One-Stage Sparse Action Detector STMixer：单级稀疏动作检测器
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition STMT：用于基于动作捕捉的动作识别的时空网格变换器
Streaming Video Model 流媒体视频模型
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction 结构多平面图像：桥接神经视图合成和 3D 重建
Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising 交叉光谱立体图像引导去噪的结构聚合
Structured 3D Features for Reconstructing Controllable Avatars 用于重建可控化身的结构化 3D 特征
Structured Kernel Estimation for Photon-Limited Deconvolution 光子受限反卷积的结构化内核估计
Structured Sparsity Learning for Efficient Video Super-Resolution 用于高效视频超分辨率的结构化稀疏学习
StructVPR: Distill Structural Knowledge With Weighting Samples for Visual Place Recognition StructVPR：使用加权样本提取结构知识以进行视觉位置识别
Style Projected Clustering for Domain Generalized Semantic Segmentation 用于域广义语义分割的样式投影聚类
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning StyleAdv：用于跨域小样本学习的元样式对抗训练
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer StyleGAN Salon：姿势不变发型迁移的多视图潜在优化
StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis StyleGene：用于亲属面部合成的区域级面部基因的交叉和突变
StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping StyleIPSB：用于高保真人脸交换的 StyleGAN 的身份保持语义基础
StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN StyleRes：使用 StyleGAN 转换真实图像编辑的残差
StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields StyleRF：神经辐射场的零样本 3D 风格迁移
StyLess: Boosting the Transferability of Adversarial Examples StyLess：提升对抗样本的可迁移性
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator StyleSync：基于风格的生成器中的高保真通用和个性化口型同步
SUDS: Scalable Urban Dynamic Scenes SUDS：可扩展的城市动态场景
SunStage: Portrait Reconstruction and Relighting Using the Sun as a Light Stage SunStage：使用太阳作为光舞台的人像重建和重新打光
Superclass Learning With Representation Enhancement 具有表示增强的超类学习
Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning Super-CLEVR：诊断视觉推理领域稳健性的虚拟基准
SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail SuperDisco：Super-Class Discovery 提高了长尾的视觉识别
Super-Resolution Neural Operator 超分辨率神经算子
Supervised Masked Knowledge Distillation for Few-Shot Transformers Few-Shot Transformer 的监督蒙面知识蒸馏
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes SurfelNeRF：用于室内场景在线逼真重建的神经 Surfel 辐射场
SVFormer: Semi-Supervised Video Transformer for Action Recognition SVFormer：用于动作识别的半监督视频转换器
SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers SVGformer：使用变形金刚进行连续矢量图形的表示学习
SViTT: Temporal Learning of Sparse Video-Text Transformers SViTT：稀疏视频文本转换器的时间学习
Swept-Angle Synthetic Wavelength Interferometry 扫掠角合成波长干涉仪
Switchable Representation Learning Framework With Self-Compatibility 具有自兼容性的可切换表示学习框架
Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion 用于无监督真实场景点云完成的对称保形自动编码器
Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement 通过跨模态分离合成逼真的虚拟人
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision SynthVSR：通过综合监督扩大视觉语音识别
System-Status-Aware Adaptive Network for Online Streaming Video Understanding 用于在线流视频理解的系统状态感知自适应网络
Takumi Kobayashi 小林拓海
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation 驯服扩散模型以生成音频驱动的同声手势
Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation 用于基于事件的增量光流估计的切向伸长高斯置信度传播
Tao Wang 王涛
TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision TAPS3D：基于伪监督的文本引导 3D 纹理形状生成
Target-Referenced Reactive Grasping for Dynamic Objects 动态对象的目标参考反应抓取
TarViS: A Unified Approach for Target-Based Video Segmentation TarViS：一种基于目标的视频分割的统一方法
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning 终身学习的任务难度感知参数分配和正则化
Task Residual for Tuning Vision-Language Models 调整视觉语言模型的任务残差
Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification 通过变分信息瓶颈进行任务特定的微调，用于弱监督病理学全幻灯片图像分类
TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving TBP-Former：学习时间鸟瞰图金字塔，用于以视觉为中心的自动驾驶中的联合感知和预测
Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models 教师生成的空间注意标签提高了对比模型的鲁棒性和准确性
Teaching Matters: Investigating the Role of Supervision in Vision Transformers 教学事项：调查监督在 Vision Transformers 中的作用
Teaching Structured Vision & Language Concepts to Vision & Language Models 向视觉和语言模型教授结构化视觉和语言概念
Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction 用于微尺度 3D 形状重建的望远镜成像系统
Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation 告诉我发生了什么：通过多模式屏蔽视频生成统一文本引导视频完成
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning 时间注意单元：迈向高效的时空预测学习
Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving 用于自动驾驶语义感知的时间一致 3D LiDAR 表示学习
Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields 时间插值是动态神经辐射场所需的一切
Temporally Consistent Online Depth Estimation Using Point-Based Fusion 使用基于点的融合进行时间一致的在线深度估计
TempSAL - Uncovering Temporal Information for Deep Saliency Prediction TempSAL - 揭示深度显着性预测的时间信息
TensoIR: Tensorial Inverse Rendering TensoIR：张量逆渲染
Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering Tensor4D：用于高保真动态重建和渲染的高效神经 4D 分解
TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation TeSLA：具有自动对抗性增强的测试时间自学习
Test of Time: Instilling Video-Language Models With a Sense of Time 时间的考验：给视频语言模型灌输时间感
Test Time Adaptation With Regularized Loss for Weakly Supervised Salient Object Detection 具有正则化损失的测试时间自适应弱监督显着目标检测
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation TexPose：用于自监督 6D 对象姿态估计的神经纹理学习
Text With Knowledge Graph Augmented Transformer for Video Captioning 用于视频字幕的带有知识图增强转换器的文本
Text2Scene: Text-Driven Indoor Scene Stylization With Part-Aware Details Text2Scene：具有部分感知细节的文本驱动室内场景风格化
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation 用于多属性图像处理的文本引导无监督潜在变换
Texts as Images in Prompt Tuning for Multi-Label Image Recognition 在多标签图像识别的快速调整中将文本作为图像
Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection 用于无监督显着对象检测的纹理引导显着性提取
Text-Visual Prompting for Efficient 2D Temporal Video Grounding 高效 2D 时间视频接地的文本视觉提示
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks 最好的防御是好的进攻：针对对抗性攻击的对抗性增强
The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection 动态路由神经网络的阴暗面：迈向高效后门注入
The Devil Is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation 魔鬼在点中：通过点引导掩码表示的弱半监督实例分割
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training 对话必须继续：通过生成式自我训练改进视觉对话
The Differentiable Lens: Compound Lens Search Over Glass Surfaces and Materials for Object Detection 可微透镜：复合透镜搜索玻璃表面和材料以进行物体检测
The Enemy of My Enemy Is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training 敌人的敌人就是我的朋友：探索逆向对手以改进对抗训练
The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects ObjectFolder 基准测试：使用神经和真实对象进行多感官学习
The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning 联邦学习中使用线性层泄漏攻击的资源问题
The Treasure Beneath Multiple Annotations: An Uncertainty-Aware Edge Detector 多重注释下的宝藏：不确定性感知边缘检测器
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction 人群的智慧：早期行动预测的时间渐进注意
Therbligs in Action: Video Understanding Through Motion Primitives Therbligs 在行动：通过运动原语理解视频
Thermal Spread Functions (TSF): Physics-Guided Material Classification 热扩散函数 (TSF)：物理指导的材料分类
Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving 驾驶前三思：面向端到端自动驾驶的可扩展解码器
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning 普遍可精简的自监督学习应了解的三个准则
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition TimeBalance：用于半监督动作识别的时间不变和时间不同的视频表示
TINC: Tree-Structured Implicit Neural Compression TINC：树结构隐式神经压缩
TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models TinyMIM：蒸馏 MIM 预训练模型的实证研究
TIPI: Test Time Adaptation With Transformation Invariance TIPI：具有变换不变性的测试时间自适应
TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering TMO：使用可微分渲染通过移动设备获取对象的纹理网格
Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training 用于稳健自监督视觉转换器预训练的令牌提升
Token Contrast for Weakly-Supervised Semantic Segmentation 弱监督语义分割的令牌对比
Token Turing Machines 代币图灵机
TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers TokenHPE：通过变形金刚进行高效头部姿势估计的学习方向标记
TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images TopDiG：从遥感图像中提取类别不可知的拓扑有向图
Top-Down Visual Attention From Analysis by Synthesis 综合分析的自上而下的视觉注意力
TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition TOPLight：具有面向任务的可见红外识别预训练的轻型神经网络
TopNet: Transformer-Based Object Placement Network for Image Compositing TopNet：用于图像合成的基于变压器的对象放置网络
Topology-Guided Multi-Class Cell Context Generation for Digital Pathology 用于数字病理学的拓扑引导的多类细胞上下文生成
ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling ToThePoint：通过回收对 3D 点云进行高效对比学习
Toward Accurate Post-Training Quantization for Image Super Resolution 实现图像超分辨率的准确训练后量化
Toward RAW Object Detection: A New Benchmark and a New Model 迈向 RAW 对象检测：新基准和新模型
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation 实现文本到图像生成的可验证和可重现的人类评估
Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval 面向较小的学生：用于高效图像检索的容量动态蒸馏
Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization 实现准确的图像编码：使用动态矢量量化改进自回归图像生成
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information 通过最大化多模态互信息实现一体化预训练
Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method 走向艺术形象美学评估：大规模数据集和新方法
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks 走向基准和评估物理世界对抗性攻击的视觉自然性
Towards Better Decision Forests: Forest Alternating Optimization 走向更好的决策森林：森林交替优化
Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment 通过水平集对齐实现神经符号距离函数更好的梯度一致性
Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation 提高稳定性和适应性：改进语义分割模型自适应的在线自训练
Towards Bridging the Performance Gaps of Joint Energy-Based Models 弥合基于能量的联合模型的性能差距
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration 通过可靠的不确定性量化和校准构建具有自我意识的物体检测器
Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations 走向组合对抗鲁棒性：将对抗训练推广到复合语义扰动
Towards Domain Generalization for Multi-View 3D Object Detection in Bird-Eye-View 面向鸟瞰多视图 3D 对象检测的域泛化
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition 在物理人脸识别上实现有效的对抗性纹理 3D 网格
Towards Effective Visual Representations for Partial-Label Learning 实现部分标签学习的有效视觉表示
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors 在基于 Transformer 的目标检测器中有效使用多尺度特征
Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional Transformers 使用内存高效的双向转换器实现长视频的端到端生成建模
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval 快速适应多通道视频语言检索的预训练对比模型
Towards Flexible Multi-Modal Document Models 迈向灵活的多模式文档模型
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training 迈向通用视频时刻检索：视觉动态注入图像文本预训练
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting 通过时空数据过度拟合实现高质量和高效的视频超分辨率
Towards Modality-Agnostic Person Re-Identification With Descriptive Query 通过描述性查询实现与模态无关的人员重新识别
Towards Open-World Segmentation of Parts 走向零件的开放世界分割
Towards Practical Plug-and-Play Diffusion Models 迈向实用的即插即用扩散模型
Towards Professional Level Crowd Annotation of Expert Domain Data 迈向专家领域数据的专业级人群注释
Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need 走向现实的长尾半监督学习：一致性就是你所需要的
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution 文档图像中的稳健篡改文本检测：新数据集和新解决方案
Towards Scalable Neural Representation for Diverse Videos 面向不同视频的可扩展神经表示
Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization 通过交叉视图融合和足部稳定实现稳定的人体姿态估计
Towards Transferable Targeted Adversarial Examples 走向可转移的有针对性的对抗性例子
Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision 通过重写模型的决策实现可信的皮肤癌诊断
Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry Priors 使用几何先验对神经隐式曲面进行无偏体积绘制
Towards Unified Scene Text Spotting Based on Sequence Generation 基于序列生成的统一场景文本识别
Towards Universal Fake Image Detectors That Generalize Across Generative Models 迈向跨生成模型的通用假图像检测器
Towards Unsupervised Object Detection From LiDAR Point Clouds 从 LiDAR 点云进行无监督目标检测
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion Trace and Pace：通过引导轨迹扩散的可控行人动画
TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments TRACE：在 3D 环境中使用动态相机对化身进行 5D 时间回归
Tracking Multiple Deformable Objects in Egocentric Videos 跟踪以自我为中心的视频中的多个可变形对象
Tracking Through Containers and Occluders in the Wild 在野外通过容器和遮挡物进行跟踪
Trade-Off Between Robustness and Accuracy of Vision Transformers 视觉转换器的稳健性和准确性之间的权衡
Train/Test-Time Adaptation With Retrieval 带检索的训练/测试时间适应
Trainable Projected Gradient Method for Robust Fine-Tuning 用于稳健微调的可训练投影梯度法
Training Debiased Subnetworks With Contrastive Weight Pruning 使用对比权重修剪训练去偏子网络
Train-Once-for-All Personalization 一劳永逸的个性化培训
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting 用于多人姿势预测的轨迹感知身体交互转换器
Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 通过迭代图细化进行基于原型的标签传播的转导式小样本学习
Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution 从头到尾传递知识：长尾分布下的不确定性校准
Transfer4D: A Framework for Frugal Motion Capture and Deformation Transfer Transfer4D：节俭运动捕捉和变形传递的框架
Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization 具有令牌梯度正则化的视觉变换器的可转移对抗性攻击
TransFlow: Transformer As Flow Learner TransFlow：Transformer 作为 Flow Learner
Transformer Scale Gate for Semantic Segmentation 用于语义分割的 Transformer Scale Gate
Transformer-Based Learned Optimization 基于变压器的学习优化
Transformer-Based Unified Recognition of Two Hands Manipulating Objects 基于Transformer的两手操作物体统一识别
Transforming Radiance Field With Lipschitz Network for Photorealistic 3D Scene Stylization 使用 Lipschitz 网络转换辐射场以实现逼真的 3D 场景风格化
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification TranSG：基于 Transformer 的骨架图原型对比学习与结构轨迹提示重建以进行行人重新识别
Trap Attention: Monocular Depth Estimation With Manual Traps 陷阱注意力：使用手动陷阱的单目深度估计
Tree Instance Segmentation With Temporal Contour Graph 使用时间等高线图的树实例分割
TriDet: Temporal Action Detection With Relative Boundary Modeling TriDet：使用相对边界建模的时间动作检测
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction 基于视觉的 3D 语义占用预测的三视角视图
TriVol: Point Cloud Rendering via Triple Volumes TriVol：通过三重体积进行点云渲染
TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets TrojDiff：针对具有不同目标的扩散模型的特洛伊木马攻击
TrojViT: Trojan Insertion in Vision Transformers TrojViT：在 Vision Transformers 中插入木马
TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization TruFor：利用全方位线索进行可信的图像伪造检测和定位
TryOnDiffusion: A Tale of Two UNets TryOnDiffusion：两个 UNet 的故事
T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection T-SEA：基于传输的目标检测自集成攻击
TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation TTA-COPE：类别级目标姿态估计的测试时间适应
Tunable Convolutions With Parametric Multi-Loss Optimization 具有参数多损失优化的可调卷积
Turning a CLIP Model Into a Scene Text Detector 将 CLIP 模型变成场景文本检测器
Turning Strengths Into Weaknesses: A Certified Robustness Inspired Attack Framework Against Graph Neural Networks 将优势转化为劣势：针对图神经网络的经过认证的鲁棒性启发式攻击框架
Twin Contrastive Learning With Noisy Labels 带有噪声标签的双对比学习
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization TWINS：改进对抗稳健性和泛化可转移性的微调框架
Two-Shot Video Object Segmentation 两次视频对象分割
Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos 基于判别表示的两阶段协同分割网络从视频中恢复人体网格
Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms 具有语义感知机制的弱监督时间动作定位的双流网络
Two-View Geometry Scoring Without Correspondences 没有对应关系的双视图几何评分
Two-Way Multi-Label Loss 双向多标签损失
UDE: A Unified Driving Engine for Human Motion Generation UDE：用于人体运动生成的统一驱动引擎
Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity 具有时空稀疏性的超高分辨率图像/视频抠图
Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark 具有超丰富上下文的超高分辨率分割：一种新的基准
UMat: Uncertainty-Aware Single Image High Resolution Material Capture UMat：不确定性感知单图像高分辨率材料捕获
Unbalanced Optimal Transport: A Unified Framework for Object Detection 非平衡最优传输：目标检测的统一框架
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection 用于弱监督视频异常检测的无偏多实例学习
Unbiased Scene Graph Generation in Videos 视频中的无偏场景图生成
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection 用于语义相干分布外检测的不确定性感知最优传输
Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior 具有深度残差先验的不确定性感知无监督图像去模糊
Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization 基于不确定性感知视觉的度量交叉视图地理定位
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models 揭示文本到图像扩散模型中的解耦能力
Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction 发现缺失的模式：轨迹插补和预测的统一框架
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias 未经整理的图像文本数据集：揭示人口偏见
Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning 理解和构建多模态表示学习中的潜在模态结构
Understanding and Improving Features Learned in Deep Functional Maps 理解和改进在深度功能图中学到的特征
Understanding and Improving Visual Prompting: A Label-Mapping Perspective 理解和改进视觉提示：标签映射视角
Understanding Deep Generative Models With Generalized Empirical Likelihoods 理解具有广义经验似然的深度生成模型
Understanding Imbalanced Semantic Segmentation Through Neural Collapse 通过神经崩溃理解不平衡的语义分割
Understanding Masked Autoencoders via Hierarchical Latent Variable Models 通过分层潜在变量模型了解屏蔽自动编码器
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature 通过学习遮挡不变特征了解蒙版图像建模
Understanding the Robustness of 3D Object Detection With Bird’s-Eye-View Representations in Autonomous Driving 在自动驾驶中使用鸟瞰图表示了解 3D 对象检测的鲁棒性
Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection Uni3D：多数据集 3D 对象检测的统一基线
Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge Unicode 类比：反客观主义视觉推理挑战
UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration UniDAformer：通过分层掩模校准的统一域自适应全景分割变换器
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy UniDexGrasp：通过学习多样化的提案生成和目标条件策略实现通用机器人灵巧抓取
UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird’s-Eye View UniDistill：用于鸟瞰 3D 对象检测的通用跨模态知识蒸馏框架
Unified Keypoint-Based Action Recognition Framework via Structured Keypoint Pooling 通过结构化关键点池统一基于关键点的动作识别框架
Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation 用于自监督视频分割的统一掩码嵌入和对应学习
Unified Pose Sequence Modeling 统一姿势序列建模
Unifying Layout Generation With a Decoupled Diffusion Model 使用解耦扩散模型统一布局生成
Unifying Short and Long-Term Tracking With Graph Hierarchies 使用图形层次结构统一短期和长期跟踪
UniHCP: A Unified Model for Human-Centric Perceptions UniHCP：以人为本的感知的统一模型
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks Uni-Perceiver v2：用于大规模视觉和视觉语言任务的通才模型
UniSim: A Neural Closed-Loop Sensor Simulator UniSim：神经闭环传感器模拟器
Unite and Conquer: Plug & Play Multi-Modal Synthesis Using Diffusion Models 团结一致：使用扩散模型进行即插即用多模态合成
Universal Instance Perception As Object Discovery and Retrieval 作为对象发现和检索的通用实例感知
Unknown Sniffer for Object Detection: Don’t Turn a Blind Eye to Unknown Objects 用于对象检测的未知嗅探器：不要对未知对象视而不见
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples 无法学习的集群：走向与标签无关的无法学习的例子
Unpaired Image-to-Image Translation With Shortest Path Regularization 具有最短路径正则化的不成对图像到图像转换
Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast for Autonomous Driving 用于自动驾驶的三角形约束对比的无监督 3D 点云表示学习
Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly 通过零件检索和组装进行无监督 3D 形状重建
Unsupervised Continual Semantic Adaptation Through Neural Rendering 通过神经渲染的无监督连续语义适应
Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses 通过机械和循环一致性损失对活细胞进行无监督轮廓跟踪
Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow 雾景光流的无监督累积域自适应
Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity 具有空间自适应自相似性的无监督深度非对称立体匹配
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration 部分点云配准的无监督深度概率方法
Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation 用于图像感知布局生成的像素级鉴别器的无监督域自适应
Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors 从单个稀疏点云无学习先验的符号距离函数的无监督推理
Unsupervised Intrinsic Image Decomposition With LiDAR Intensity 具有 LiDAR 强度的无监督本征图像分解
Unsupervised Object Localization: Observing the Background To Discover Objects 无监督对象定位：观察背景以发现对象
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction 无监督抽样促进随机人体轨迹预测
Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions 用于多个运动的时间一致分割的无监督时空网络
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning 通过渐进图匹配和交替学习进行无监督可见红外行人再识别
Unsupervised Volumetric Animation 无监督体积动画
Upcycling Models Under Domain and Category Shift 领域和品类转变下的升级再造模型
Use Your Head: Improving Long-Tail Video Recognition 使用你的头脑：改进长尾视频识别
UTM: A Unified Multiple Object Tracking Model With Identity-Aware Feature Enhancement UTM：具有身份感知功能增强的统一多目标跟踪模型
UV Volumes for Real-Time Rendering of Editable Free-View Human Performance 用于实时渲染可编辑的自由视图人体表现的 UV 体积
V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception V2V4Real：用于车对车协同感知的真实世界大规模数据集
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting V2X-Seq：用于车辆-基础设施协同感知和预测的大规模时序数据集
Variational Distribution Learning for Unsupervised Text-to-Image Generation 无监督文本到图像生成的变分分布学习
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization VDN-NeRF：通过视图依赖归一化解决形状辐射模糊
VecFontSDF: Learning To Reconstruct and Synthesize High-Quality Vector Fonts via Signed Distance Functions VecFontSDF：学习通过符号距离函数重构和合成高质量矢量字体
Vector Quantization With Self-Attention for Quality-Independent Representation Learning 用于与质量无关的表示学习的自注意矢量量化
VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation VectorFloorSeg：用于矢量化粗略平面图分割的双流图注意网络
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models VectorFusion：通过抽象基于像素的扩散模型将文本转为 SVG
VGFlow: Visibility Guided Flow Network for Human Reposing VGFlow：用于人体安放的可见性引导流网络
Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition Vid2Avatar：通过自监督场景分解从野外视频重建 3D 头像
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Vid2Seq：用于密集视频字幕的视觉语言模型的大规模预训练
Video Compression With Entropy-Constrained Neural Representations 具有熵约束神经表示的视频压缩
Video Dehazing via a Multi-Range Temporal Alignment Network With Physical Prior 通过具有物理先验的多范围时间对齐网络进行视频去雾
Video Event Restoration Based on Keyframes for Video Anomaly Detection 基于关键帧的视频事件恢复用于视频异常检测
Video Probabilistic Diffusion Models in Projected Latent Space 投影潜在空间中的视频概率扩散模型
Video Test-Time Adaptation for Action Recognition 动作识别的视频测试时间自适应
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation VideoFusion：用于生成高质量视频的分解扩散模型
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking VideoMAE V2：使用双重掩蔽缩放视频掩蔽自动编码器
Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning 作为游戏玩家的视频文本：用于跨模态表示学习的分层 Banzhaf 交互
VideoTrack: Learning To Track Objects via Video Transformer VideoTrack：通过视频转换器学习跟踪对象
ViewNet: A Novel Projection-Based Backbone With View Pooling for Few-Shot Point Cloud Classification ViewNet：一种基于投影的新型主干，具有视图池，用于少镜头点云分类
Viewpoint Equivariance for Multi-View 3D Object Detection 多视图 3D 对象检测的视点等方差
VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining VILA：通过视觉语言预训练从用户评论中学习图像美学
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval ViLEM：图像文本检索的视觉语言错误建模
VindLU: A Recipe for Effective Video-and-Language Pretraining VindLU：有效视频和语言预训练的秘诀
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries ViP3D：通过 3D 代理查询进行端到端视觉轨迹预测
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection ViPLO：用于人-物交互检测的基于视觉变换器的姿势条件自环图
Virtual Occlusions Through Implicit Depth 通过隐式深度的虚拟遮挡
Virtual Sparse Convolution for Multimodal 3D Object Detection 用于多模式 3D 对象检测的虚拟稀疏卷积
VisFusion: Visibility-Aware Online 3D Scene Reconstruction From Videos VisFusion：基于视频的可见性感知在线 3D 场景重建
Visibility Aware Human-Object Interaction Tracking From Single RGB Camera 单个 RGB 摄像机的可见性感知人机交互跟踪
Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark 用于黑暗中视的能见度约束宽带照明光谱设计
Vision Transformer With Super Token Sampling 具有超级令牌采样的视觉转换器
Vision Transformers Are Good Mask Auto-Labelers Vision Transformers 是很好的掩膜自动贴标机
Vision Transformers Are Parameter-Efficient Audio-Visual Learners Vision Transformers 是参数有效的视听学习器
Visual Atoms: Pre-Training Vision Transformers With Sinusoidal Waves 视觉原子：使用正弦波预训练视觉转换器
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention Visual Dependency Transformers：依赖树从反向注意中出现
Visual DNA: Representing and Comparing Images Using Distributions of Neuron Activations 视觉 DNA：使用神经元激活分布表示和比较图像
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving 视觉范例驱动的自动驾驶统一感知任务提示
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images 用于组织病理学图像的视觉语言预训练多实例零样本传输
Visual Localization Using Imperfect 3D Models From the Internet 使用来自互联网的不完美 3D 模型进行视觉定位
Visual Programming: Compositional Visual Reasoning Without Training 可视化编程：无需训练的组合视觉推理
Visual Prompt Multi-Modal Tracking 视觉提示多模式跟踪
Visual Prompt Tuning for Generative Transfer Learning 生成迁移学习的视觉提示调整
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning 视觉查询调优：有效使用中间表示进行参数和内存高效迁移学习
Visual Recognition by Request 按要求进行视觉识别
Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery 具有内在语义恢复的多重退化的视觉识别驱动图像恢复
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization 以知识为导向的上下文优化的视觉语言提示调整
Visual-Tactile Sensing for In-Hand Object Reconstruction 用于手中物体重建的视觉触觉传感
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting Vita-CLIP：通过多模式提示的视频和文本自适应 CLIP
ViTs for SITS: Vision Transformers for Satellite Image Time Series SITS 的 ViT：卫星图像时间序列的视觉转换器
VIVE3D: Viewpoint-Independent Video Editing Using 3D-Aware GANs VIVE3D：使用 3D 感知 GAN 进行独立于视点的视频编辑
Vladimir Kolmogorov 弗拉基米尔柯尔莫哥洛夫
VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision VLPD：通过视觉语言语义自我监督的情境感知行人检测
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud VL-SAT：点云中 3D 语义场景图预测的视觉语言语义辅助训练
vMAP: Vectorised Object Mapping for Neural Field SLAM vMAP：用于神经场 SLAM 的矢量化对象映射
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution VNE：一种通过操纵特征值分布来改善深度表示的有效方法
VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction VolRecon：用于通用多视图重建的符号射线距离函数的体积绘制
VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval VoP：用于跨模态检索的文本视频合作提示调整
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking VoxelNeXt：用于 3D 对象检测和跟踪的完全稀疏 VoxelNet
VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion VoxFormer：用于基于相机的 3D 语义场景完成的稀疏体素变换器
VQACL: A Novel Visual Question Answering Continual Learning Setting VQACL：一种新颖的视觉问答持续学习设置
Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring 观看或收听：具有视觉损坏建模和可靠性评分的强大视听语音识别
Wavelet Diffusion Models Are Fast and Scalable Image Generators 小波扩散模型是快速且可扩展的图像生成器
Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving 自动驾驶的弱监督类不可知运动预测
Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection and Direction Consistency 使用多视图投影和方向一致性的弱监督单目 3D 对象检测
Weakly Supervised Posture Mining for Fine-Grained Classification 用于细粒度分类的弱监督姿态挖掘
Weakly Supervised Segmentation With Point Annotations for Histopathology Images via Contrast-Based Variational Model 通过基于对比的变分模型对组织病理学图像进行点注释的弱监督分割
Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor 通过分类器和重构器的对抗学习进行弱监督语义分割
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training 具有不确定性引导自训练的弱监督时间句子接地
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network 基于跨模态时间擦除网络的弱监督视频情感检测和预测
Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos 序列视频未对齐文本的弱监督视频表示学习
Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning 具有原型对比学习的弱监督域自适应语义分割
Weakly-Supervised Single-View Image Relighting 弱监督单视图图像重新照明
Weak-Shot Object Detection Through Mutual Knowledge Transfer 通过相互知识转移进行弱镜头目标检测
WeatherStream: Light Transport Automation of Single Image Deweathering WeatherStream：单幅图像去风化的光传输自动化
What Can Human Sketches Do for Object Detection? 人体草图可以为物体检测做什么？
What Happened 3 Seconds Ago? Inferring the Past With Thermal Imaging 3 秒前发生了什么？用热成像推断过去
What You Can Reconstruct From a Shadow 你可以从阴影中重建什么
Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization 我的位置在哪里？通过潜在子空间优化生成小样本图像
Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization 我的钱包在哪里？为自我中心视觉查询本地化建模对象建议集
Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes 我们在哪里和我们在看什么：使用层次结构和场景的基于查询的全球图像地理定位
Why Is the Winner the Best? 为什么赢家是最好的？
Wide-Angle Rectification via Content-Aware Conformal Mapping 通过内容感知共形映射进行广角校正
WildLight: In-the-Wild Inverse Rendering With a Flashlight WildLight：使用手电筒进行野外逆向渲染
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation WinCLIP：零/少样本异常分类和分割
WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding 获胜者：时空视频接地的弱监督层次分解和对齐
WIRE: Wavelet Implicit Neural Representations WIRE：小波隐式神经表示
X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection X3KD：多相机 3D 对象检测的跨模式、任务和阶段的知识蒸馏
X-Avatar: Expressive Human Avatars X-Avatar：富有表现力的人类化身
Xiongbiao Luo 罗雄标
X-Pruner: eXplainable Pruning for Vision Transformers X-Pruner：视觉转换器的可解释修剪
Yiming Cui 崔一鸣
Ying Zhao 赵颖
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors YOLOv7：可训练的 Bag-of-Freebies 为实时目标检测器设置了新的最先进技术
You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks? 你引起了我的注意：Vision Transformers 是否会受到后门攻击？
You Can Ground Earlier Than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos 你可以比看到更早地接地：压缩视频中时间句子接地的有效且高效的管道
You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement 在基于 Retinex 的低光图像增强中，您不需要额外的先验或正则化器
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model 您需要多次退出：用于加速统一视觉语言模型的动态提前退出
You Only Segment Once: Towards Real-Time Panoptic Segmentation 你只分割一次：迈向实时全景分割
Yusuke Yoshiyasu 义保佑介
ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection ZBS：通过实例级背景建模和前景选择进行零镜头背景减除
ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation ZegCLIP：为零样本语义分割调整 CLIP
Zero-Shot Dual-Lens Super-Resolution 零镜头双镜头超分辨率
Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style 零镜头一切基于草图的图像检索，以及可解释的风格
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning 通过特定于图像的即时学习进行零样本生成模型自适应
Zero-Shot Model Diagnosis 零样本模型诊断
Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data Zero-Shot Noise2Noise：无需任何数据的高效图像去噪
Zero-Shot Object Counting 零次目标计数
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters 未绑定的程式化 3D 角色的零镜头姿势转移
Zero-Shot Referring Image Segmentation With Global-Local Context Features 具有全局-局部上下文特征的零样本参考图像分割
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation 用于游戏角色自动创建的零样本文本到参数转换