【CVPR2022】论文列表与下载——PartFour

CVPR2022将于6月22日召开🎉🎉🎉,本次会议共收录了2067篇论文。由于数量较多,本文将分四个子文章呈现,可直接点击论文标题获取文档。
📃第一部分, 📃第二部分, 📃 第三部分
在这里插入图片描述

在这里插入图片描述

4.Part Four

Sparse Fuse Dense: Towards High Quality 3D Detection With Depth Completion [supp]
GIRAFFE HD: A High-Resolution 3D-Aware Generative Model [supp]
InOut: Diverse Image Outpainting via GAN Inversion
PNP: Robust Learning From Noisy Labels by Probabilistic Noise Prediction
Estimating Structural Disparities for Face Models
Revisiting the Transferability of Supervised Pretraining: An MLP Perspective [supp]
Plenoxels: Radiance Fields Without Neural Networks [supp]
What Matters for Meta-Learning Vision Regression Tasks? [supp]
Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition
Selective-Supervised Contrastive Learning With Noisy Labels [supp]
Learning Second Order Local Anomaly for General Face Forgery Detection
ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation [supp]
The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation [supp]
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [supp]
SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation [supp]
Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs [supp]
PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions [supp]
PTTR: Relational 3D Point Cloud Object Tracking With Transformer
Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity [supp]
ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds [supp]
Video Demoireing With Relation-Based Temporal Consistency
Co-Domain Symmetry for Complex-Valued Deep Learning
Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation [supp]
Modeling Image Composition for Complex Scene Generation [supp]
SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud
Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer [supp]
GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation [supp]
UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training
GraFormer: Graph-Oriented Transformer for 3D Pose Estimation
Decoupling Zero-Shot Semantic Segmentation [supp]
Neural Collaborative Graph Machines for Table Structure Recognition [supp]
Towards Robust Vision Transformer [supp]
DeepCurrents: Learning Implicit Representations of Shapes With Boundaries
Learning Affordance Grounding From Exocentric Images [supp]
Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions [supp]
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability [supp]
Unknown-Aware Object Detection: Learning What You Don't Know From Videos in the Wild [supp]
Multi-Modal Extreme Classification
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement [supp]
Training-Free Transformer Architecture Search [supp]
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation [supp]
Non-Isotropy Regularization for Proxy-Based Deep Metric Learning
C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation [supp]
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation [supp]
3DAC: Learning Attribute Compression for Point Clouds [supp]
Learning a Structured Latent Space for Unsupervised Point Cloud Completion
The Wanderings of Odysseus in 3D Scenes [supp]
Few-Shot Learning With Noisy Labels [supp]
Understanding 3D Object Articulation in Internet Videos
Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention [supp]
Interactive Image Synthesis With Panoptic Layout Generation [supp]
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving [supp]
All-in-One Image Restoration for Unknown Corruption [supp]
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [supp]
Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval [supp]
PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors [supp]
PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos [supp]
Deep Equilibrium Optical Flow Estimation [supp]
Optimizing Video Prediction via Video Frame Interpolation
Motron: Multimodal Probabilistic Human Motion Forecasting [supp]
Episodic Memory Question Answering [supp]
Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture [supp]
Few-Shot Backdoor Defense Using Shapley Estimation
Cycle-Consistent Counterfactuals by Latent Transformations [supp]
ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation [supp]
Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos [supp]
Blind Face Restoration via Integrating Face Shape and Generative Priors [supp]
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [supp]
Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data [supp]
Learning To Zoom Inside Camera Imaging Pipeline [supp]
High-Fidelity GAN Inversion for Image Attribute Editing [supp]
RCP: Recurrent Closest Point for Point Cloud
gDNA: Towards Generative Detailed Neural Avatars [supp]
A Dual Weighting Label Assignment Scheme for Object Detection
FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks [supp]
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning [supp]
MaskGIT: Masked Generative Image Transformer [supp]
Revisiting the "Video" in Video-Language Understanding
Local Texture Estimator for Implicit Representation Function
Instance-Aware Dynamic Neural Network Quantization [supp]
When To Prune? A Policy Towards Early Structural Pruning [supp]
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval [supp]
Degree-of-Linear-Polarization-Based Color Constancy [supp]
A Voxel Graph CNN for Object Classification With Event Cameras [supp]
On the Importance of Asymmetry for Siamese Representation Learning [supp]
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning [supp]
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting [supp]
Exploring Effective Data for Surrogate Training Towards Black-Box Attack [supp]
JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection [supp]
AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields [supp]
Likert Scoring With Grade Decoupling for Long-Term Action Assessment [supp]
Many-to-Many Splatting for Efficient Video Frame Interpolation
Investigating Top-k White-Box and Transferable Black-Box Attack [supp]
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition [supp]
Learning To Learn by Jointly Optimizing Neural Architecture and Weights [supp]
Attributable Visual Similarity Learning [supp]
A Self-Supervised Descriptor for Image Copy Detection [supp]
DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion [supp]
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [supp]
Manifold Learning Benefits GANs [supp]
A Keypoint-Based Global Association Network for Lane Detection
Negative-Aware Attention Framework for Image-Text Matching [supp]
Semantic-Aligned Fusion Transformer for One-Shot Object Detection [supp]
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning [supp]
Few-Shot Incremental Learning for Label-to-Image Translation [supp]
Discrete Time Convolution for Fast Event-Based Stereo [supp]
An Image Patch Is a Wave: Phase-Aware Vision MLP [supp]
Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination [supp]
Visual Acoustic Matching [supp]
Shunted Self-Attention via Multi-Scale Token Aggregation
Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon
ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging [supp]
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [supp]
3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image [supp]
Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning
Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency
Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion [supp]
Scale-Equivalent Distillation for Semi-Supervised Object Detection [supp]
Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction [supp]
SelfD: Self-Learning Large-Scale Driving Policies From the Web
"The Pedestrian Next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping [supp]
Attribute Group Editing for Reliable Few-Shot Image Generation [supp]
Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning [supp]
CroMo: Cross-Modal Learning for Monocular Depth Estimation [supp]
Self-Supervised Object Detection From Audio-Visual Correspondence
Autofocus for Event Cameras [supp]
Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model
Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps [supp]
Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond [supp]
Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) [supp]
TrackFormer: Multi-Object Tracking With Transformers [supp]
L-Verse: Bidirectional Generation Between Image and Text [supp]
PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation
3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow
Feature Statistics Mixing Regularization for Generative Adversarial Networks [supp]
Learning To Learn and Remember Super Long Multi-Domain Task Sequence [supp]
OpenTAL: Towards Open Set Temporal Action Localization [supp]
Urban Radiance Fields [supp]
Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection [supp]
Domain-Agnostic Prior for Transfer Semantic Segmentation
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning [supp]
Ego4D: Around the World in 3,000 Hours of Egocentric Video
] [ supp ] Differentially Private Federated Learning With Local Regularization and Sparsification [ supp ] Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis [ supp ] Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data [ supp ] Point-Level Region Contrast for Object Detection Pre-Training [ supp ] Upright-Net: Learning Upright Orientation for 3D Point Cloud Learning Semantic Associations for Mirror Detection Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation [ supp ] Failure Modes of Domain Generalization Algorithms [ supp ] Geometric and Textural Augmentation for Domain Gap Reduction [ supp ] Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image [ supp ] Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors [ supp ] HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization [ supp ] Fine-Tuning Image Transformers Using Learnable Memory [ supp ] Contrastive Conditional Neural Processes [ supp ] vCLIMB: A Novel Video Class Incremental Learning Benchmark [ supp ] Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation [ supp ] Sparse and Complete Latent Organization for Geospatial Semantic Segmentation Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements [ supp ] Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation [ supp ] Learning To Detect Scene Landmarks for Camera Localization INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation [ supp ] ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation [ supp ] Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video [ supp ] Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels [ supp ] Interacting Attention Graph for Single Image Two-Hand Reconstruction [ supp ] Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task [ supp ] Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation? [ supp ] Boosting View Synthesis With Residual Transfer Input-Level Inductive Biases for 3D Reconstruction [ supp ] Exploring and Evaluating Image Restoration Potential in Dynamic Scenes [ supp ] FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback Cross-Image Relational Knowledge Distillation for Semantic Segmentation [ supp ] A-ViT: Adaptive Tokens for Efficient Vision Transformer [ supp ] Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation [ supp ] Towards Layer-Wise Image Vectorization [ supp ] Scenic: A JAX Library for Computer Vision Research and Beyond CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters [ supp ] ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning [ supp ] Calibrating Deep Neural Networks by Pairwise Constraints [ supp ] Deep Saliency Prior for Reducing Visual Distraction Efficient Large-Scale Localization by Global Instance Recognition [ supp ] Sign Language Video Retrieval With Free-Form Textual Queries [ supp ] Real-Time Object Detection for Streaming Perception Simulated Adversarial Testing of Face Recognition Models [ supp] [ arXiv ]
VisualHow: Multimodal Problem Solving [ supp ] Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [ supp ] Spatial Commonsense Graph for Object Localisation in Partial Scenes [ supp ] CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection [ supp ] OSSGAN: Open-Set Semi-Supervised Image Generation [ supp ] Lite Vision Transformer With Enhanced Self-Attention Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning [ supp ] Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling [ supp ] M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining [ supp ] Bi-Level Alignment for Cross-Domain Crowd Counting [ supp ] ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation [ supp ] Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites [ supp ] Efficient Multi-View Stereo by Iterative Dynamic Cost Volume [ supp ] Learning To Generate Line Drawings That Convey Geometry and Semantics On Guiding Visual Attention With Language Specification [ supp ] ReSTR: Convolution-Free Referring Image Segmentation Using Transformers [ supp ] A Graph Matching Perspective With Transformers on Video Instance Segmentation [ supp ] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing [ supp ] FLAG: Flow-Based 3D Avatar Generation From Sparse Observations [ supp ] Stability-Driven Contact Reconstruction From Monocular Color Images [ supp ] Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework [ supp ] SGTR: End-to-End Scene Graph Generation With Transformer [ supp ] Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation Texture-Based Error Analysis for Image Super-Resolution PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework [ supp ] Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency [ supp ] Learning To Align Sequential Actions in the Wild [ supp ] Decoupled Knowledge Distillation [ supp ] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [ supp ] Neural Volumetric Object Selection GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning [ supp ] PointCLIP: Point Cloud Understanding by CLIP [ supp ] NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification [ supp ] A Sampling-Based Approach for Efficient Clustering in Large Datasets General Facial Representation Learning in a Visual-Linguistic Manner [ supp ] Deep Color Consistent Network for Low-Light Image Enhancement AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks [ supp ] Reusing the Task-Specific Classifier as a Discriminator: Discriminator-Free Adversarial Domain Adaptation [ supp ] Pooling Revisited: Your Receptive Field Is Suboptimal [ supp ] Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches [ supp ] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [ supp ] Patch Slimming for Efficient Vision Transformers Bijective Mapping Network for Shadow Removal End-to-End Semi-Supervised Learning for Video Action Detection [ supp ] Causal Transportability for Visual Recognition Local Attention Pyramid for Scene Image Generation [ supp ] Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation [ supp ] GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking [ supp ] Confidence Propagation Cluster: Unleash Full Potential of Object Detectors Cluster-Guided Image Synthesis With Unconditional Models [ supp ] ISNet: Shape Matters for Infrared Small Target Detection Robust Region Feature Synthesizer for Zero-Shot Object Detection [ supp ] Virtual Correspondence: Humans as a Cue for Extreme-View Geometry Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks [ supp ] Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss [ supp ] HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR [ supp ] CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings [ supp ] IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization [ supp ] M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers [ supp ] I M Avatar: Implicit Morphable Head Avatars From Videos [ supp ] BodyMap: Learning Full-Body Dense Correspondence Map [ supp ] Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images [ supp ] A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting [ supp ] It's All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher [ supp ] Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution [ supp ] Multi-Modal Dynamic Graph Transformer for Visual Grounding [ supp ] OSOP: A Multi-Stage One Shot Object Pose Estimation Framework [ supp ] Generative Cooperative Learning for Unsupervised Video Anomaly Detection [ supp ] Rethinking Semantic Segmentation: A Prototype View [ supp ] Geometric Transformer for Fast and Robust Point Cloud Registration [ supp ] Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [ supp ] UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection [ supp ] Dual-Shutter Optical Vibration Sensing [ supp ] Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? [ supp ] Learning To Find Good Models in RANSAC [ supp ] Interactiveness Field in Human-Object Interactions [ supp ] BodyGAN: General-Purpose Controllable Neural Human Body Generation [ supp ] Image Disentanglement Autoencoder for Steganography Without Embedding Self-Supervised Dense Consistency Regularization for Image-to-Image Translation [ supp ] The Devil Is in the Details: Window-Based Attention for Image Compression [ supp ] Category-Aware Transformer Network for Better Human-Object Interaction Detection [ supp ] Deep Depth From Focus With Differential Focus Volume [ supp ] DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation [ supp ] Robust Fine-Tuning of Zero-Shot Models [ supp ] Towards Data-Free Model Stealing in a Hard Label Setting [ supp ] PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images [ supp ] GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings [ supp ] Multi-Granularity Alignment Domain Adaptation for Object Detection [ supp ] LARGE: Latent-Based Regression Through GAN Semantics [ supp ] Are Multimodal Transformers Robust to Missing Modality? Degradation-Agnostic Correspondence From Resolution-Asymmetric Stereo [ supp ] Fisher Information Guidance for Learned Time-of-Flight Imaging [ supp ] VRDFormer: End-to-End Video Visual Relation Detection With Transformers [ supp ] Robust Federated Learning With Noisy and Heterogeneous Clients [ supp ] Enabling Equivariance for Arbitrary Lie Groups [ supp ] Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors [ supp ] GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision [ supp ] Learning Pixel-Level Distinctions for Video Highlight Detection [ supp ] Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching [ supp ] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution [ supp ] CLIPstyler: Image Style Transfer With a Single Text Condition [ supp ] Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation Spatio-Temporal Relation Modeling for Few-Shot Action Recognition [ supp ] Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian [ supp ] Volumetric Bundle Adjustment for Online Photorealistic Scene Capture [ supp ] Multi-Person Extreme Motion Prediction [ supp ] Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network [ supp ] Channel Balancing for Accurate Quantization of Winograd Convolutions [ supp ] RegNeRF: Regularizing Neural Radiance Fields for View Synthesis From Sparse Inputs Structured Local Radiance Fields for Human Avatar Modeling [ supp ] Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation [ supp ] Ranking-Based Siamese Visual Tracking Learnable Lookup Table for Neural Network Quantization [ supp ] SEEG: Semantic Energized Co-Speech Gesture Generation [ supp ] AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [ supp ] Compound Domain Generalization via Meta-Knowledge Encoding NAN: Noise-Aware NeRFs for Burst-Denoising Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors [ supp ] b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search [ supp ] Vector Quantized Diffusion Model for Text-to-Image Synthesis CMT: Convolutional Neural Networks Meet Vision Transformers [ supp ] Hyperspherical Consistency Regularization Unsupervised Image-to-Image Translation With Generative Prior [ supp ] KNN Local Attention for Image Restoration [ supp ] Face Relighting With Geometrically Consistent Shadows [ supp ] Open-Set Text Recognition via Character-Context Decoupling [ supp ] Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization [ supp ] Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences [ supp ] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model [ supp ] Optimizing Elimination Templates by Greedy Parameter Search [ supp ] TransMix: Attend To Mix for Vision Transformers [ supp ] HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation [ supp ] Inertia-Guided Flow Completion and Style Fusion for Video Inpainting [ supp ] RU-Net: Regularized Unrolling Network for Scene Graph Generation [ supp ] Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment Image Animation With Perturbed Masks [ supp ] Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework [ supp ] Point Density-Aware Voxels for LiDAR 3D Object Detection [ supp ] Integrating Language Guidance Into Vision-Based Deep Metric Learning [ supp ] PartGlot: Learning Shape Part Segmentation From Language Reference Games [ supp ] Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing [ supp ] A Simple Episodic Linear Probe Improves Visual Recognition in the Wild [ supp ] Matching Feature Sets for Few-Shot Image Classification [ supp ] DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering [ supp ] Enhancing Classifier Conservativeness and Robustness by Polynomiality Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization [ supp ] OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction [ supp ] ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics [ supp ] Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective [ supp ] MonoScene: Monocular 3D Semantic Scene Completion [ supp ] TubeFormer-DeepLab: Video Mask Transformer [ supp ] XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation [ supp ] Disentangling Visual and Written Concepts in CLIP Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction [ supp ] Bilateral Video Magnification Filter [ supp ] AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition [ supp ] Localization Distillation for Dense Object Detection [ supp ] What's in Your Hands? 3D Reconstruction of Generic Objects in Hands [ supp ] Continuous Scene Representations for Embodied AI [ supp ] Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds [ supp ] Neural Mean Discrepancy for Efficient Out-of-Distribution Detection [ supp ] Non-Probability Sampling Network for Stochastic Human Trajectory Prediction Marginal Contrastive Correspondence for Guided Image Generation Complex Backdoor Detection by Symmetric Feature Differencing [ supp ] Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion [ supp ] ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning [ supp ] RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks Human-Aware Object Placement for Visual Environment Reconstruction [ supp ] X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval [ supp ] Learning of Global Objective for Network Flow in Multi-Object Tracking Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer [ supp ] Gated2Gated: Self-Supervised Depth Estimation From Gated Images [ supp ] RAMA: A Rapid Multicut Algorithm on GPU [ supp ] Adversarial Parametric Pose Prior [ supp ] DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning [ supp ] Mask Transfiner for High-Quality Instance Segmentation [ supp ] End-to-End Reconstruction-Classification Learning for Face Forgery Detection [ supp ] It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection [ supp ] Transferability Metrics for Selecting Source Model Ensembles [ supp ] Neural Global Shutter: Learn To Restore Video From a Rolling Shutter Camera With Global Reset Feature DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis [ supp ] Open Challenges in Deep Stereo: The Booster Dataset [ supp ] Location-Free Human Pose Estimation Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects [ supp ] PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking [ supp ] Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network [ supp ] Efficient Maximal Coding Rate Reduction by Variational Forms [ supp ] Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions [ supp ] AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification [ supp ] YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [ supp ] Sound-Guided Semantic Image Manipulation [ supp ] Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification [ supp ] Proper Reuse of Image Classification Features Improves Object Detection [ supp ] MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision [ supp ] End-to-End Human-Gaze-Target Detection With Transformers The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution [ supp ] Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning [ supp ] Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [ supp ] Future Transformer for Long-Term Action Anticipation [ supp ] Optimal LED Spectral Multiplexing for NIR2RGB Translation Rethinking Spatial Invariance of Convolutional Networks for Object Counting Self-Supervised Video Transformer [ supp ] AutoRF: Learning 3D Object Radiance Fields From Single View Observations [ supp ] Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification Neural RGB-D Surface Reconstruction [ supp ] ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation [ supp ] ClothFormer: Taming Video Virtual Try-On in All Module [ supp ] Cross-Domain Adaptive Teacher for Object Detection Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation [ supp ] Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation [ supp ] Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles [ supp ] Condensing CNNs With Partial Differential Equations [ supp ] Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species [ supp ] Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching [ supp ] Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection TubeR: Tubelet Transformer for Video Action Detection [ supp ] LASER: LAtent SpacE Rendering for 2D Visual Localization [ supp ] MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection [ supp ] On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles [ supp ] Kubric: A Scalable Dataset Generator
] [ supp ] Unpaired Deep Image Deraining Using Dual Contrastive Learning Learning Multiple Dense Prediction Tasks From Partially Annotated Data [ supp ] Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation [ supp ] Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [ supp ] Towards Low-Cost and Efficient Malaria Detection Learning Neural Light Fields With Ray-Space Embedding [ supp ] Exposure Normalization and Compensation for Multiple-Exposure Correction [ supp ] UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation Learning Non-Target Knowledge for Few-Shot Semantic Segmentation TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers [ supp ] Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders [ supp ] Clean Implicit 3D Structure From Noisy 2D STEM Images [ supp ] UKPGAN: A General Self-Supervised Keypoint Detector [ supp ] Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks [ supp ] Leveraging Adversarial Examples To Quantify Membership Information Leakage [ supp ] Raw High-Definition Radar for Multi-Task Learning [ supp ] Point-NeRF: Point-Based Neural Radiance Fields [ supp ] Contextual Debiasing for Visual Recognition With Causal Mechanisms Complex Video Action Reasoning via Learnable Markov Logic Network [ supp ] Per-Clip Video Object Segmentation [ supp ] Exploring Set Similarity for Dense Self-Supervised Representation Learning Coarse-To-Fine Feature Mining for Video Semantic Segmentation ONCE-3DLanes: Building Monocular 3D Lane Detection [ supp ] Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts [ supp ] Compressing Models With Few Samples: Mimicking Then Replacing [ supp ] FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning [ supp ] Modulated Contrast for Versatile Image Synthesis PokeBNN: A Binary Pursuit of Lightweight Accuracy [ supp ] HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs [ supp ] Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection [ supp ] Identifying Ambiguous Similarity Conditions via Semantic Matching [ supp ] MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting Cascade Transformers for End-to-End Person Search MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection [ supp ] LSVC: A Learning-Based Stereo Video Compression Framework How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs [ supp ] InsetGAN for Full-Body Image Generation [ supp ] DetectorDetective: Investigating the Effects of Adversarial Examples on Object Detectors SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images [ supp ] EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching [ supp ] SNR-Aware Low-Light Image Enhancement 3D Common Corruptions and Data Augmentation PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision [ supp ] Injecting Semantic Concepts Into End-to-End Image Captioning [ supp ] An Efficient Training Approach for Very Large Scale Face Recognition Long-Term Video Frame Interpolation via Feature Propagation Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation Event-Aided Direct Sparse Odometry [ supp ] Group Contextualization for Video Recognition [ supp ] Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation [ supp ] Visual Abductive Reasoning [ supp ] L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation [ supp ] Continual Learning With Lifelong Vision Transformer [ supp ] MPViT: Multi-Path Vision Transformer for Dense Prediction [ supp ] NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models [ supp ] Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [ supp ] SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing [ supp ] Accurate 3D Body Shape Regression Using Metric and Semantic Attributes [ supp ] VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers Label-Only Model Inversion Attacks via Boundary Repulsion [ supp ] Privacy-Preserving Online AutoML for Domain-Specific Face Detection [ supp ] Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition Neural 3D Video Synthesis From Multi-View Video [ supp ] LiDAR Snowfall Simulation for Robust 3D Object Detection [ supp ] Learning Where To Learn in Cross-View Self-Supervised Learning [ supp ] SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [ supp ] Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings [ supp ] How Much More Data Do I Need? Estimating Requirements for Downstream Tasks [ supp ] Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search [ supp ] The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement [ supp ] Learning What Not To Segment: A New Perspective on Few-Shot Segmentation [ supp ] Blended Diffusion for Text-Driven Editing of Natural Images [ supp ] Towards Unsupervised Domain Generalization HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening [ supp ] Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [ supp ] Robust Invertible Image Steganography Entropy-Based Active Learning for Object Detection With Progressive Diversity Constraint [ supp ] BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement [ supp ] A Structured Dictionary Perspective on Implicit Neural Representations [ supp ] Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Vision-Language Pre-Training With Triple Contrastive Learning [ supp ] Structure-Aware Flow Generation for Human Body Reshaping [ supp ] Practical Learned Lossless JPEG Recompression With Multi-Level Cross-Channel Entropy Model in the DCT Domain [ supp ] Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time [ supp ] Learning To Answer Questions in Dynamic Audio-Visual Scenarios [ supp ] Leveraging Equivariant Features for Absolute Pose Regression Synthetic Aperture Imaging With Events and Frames [ supp ] CLIP-Event: Connecting Text and Images With Event Structures [ supp ] MonoGround: Detecting Monocular 3D Objects From the Ground Deep Visual Geo-Localization Benchmark [ supp ] Scaling Up Vision-Language Pre-Training for Image Captioning Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning [ supp ] StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2 [ supp ] Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks [ supp ] Scaling Vision Transformers [ supp ] Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering [ supp ] Pin the Memory: Learning To Generalize Semantic Segmentation [ supp ] LISA: Learning Implicit Shape and Appearance of Hands [ supp ] DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds [ supp ] Iterative Deep Homography Estimation [ supp ] Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels [ supp ] Learned Queries for Efficient Local Attention [ supp ] Stereoscopic Universal Perturbations Across Different Architectures and Datasets [ supp ] Colar: Effective and Efficient Online Action Detection by Consulting Exemplars AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation [ supp ] DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos [ supp ] HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging [ supp ] Leveraging Self-Supervision for Cross-Domain Crowd Counting MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders [ supp ] PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo [ supp ] Scene Graph Expansion for Semantics-Guided Image Outpainting [ supp ] SoftGroup for 3D Instance Segmentation on Point Clouds SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions [ supp ] FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis [ supp ] Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement [ supp ] Bridged Transformer for Vision and Point Cloud 3D Object Detection [ supp ] Deep Constrained Least Squares for Blind Image Super-Resolution [ supp ] EDTER: Edge Detection With Transformer [ supp ] Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning [ supp ] JIFF: Jointly-Aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction [ supp ] Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings [ supp ] Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning [ supp ] Symmetry-Aware Neural Architecture for Embodied Visual Exploration [ supp ] AirObject: A Temporally Evolving Graph Embedding for Object Identification [ supp ] From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering [ supp ] Semantic-Aware Domain Generalized Segmentation [ supp ] TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation [ supp ] DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Unsupervised Learning of Debiased Representations With Pseudo-Attributes [ supp ] Protecting Celebrities From DeepFake With Identity Consistency Transformer [ supp ] Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness [ supp ] TubeDETR: Spatio-Temporal Video Grounding With Transformers [ supp ] KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning [ supp ] SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos [ supp ] CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection [ supp ] Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image [ supp ] Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation [ supp ] AKB-48: A Real-World Articulated Object Knowledge Base [ supp ] Style-ERD: Responsive and Coherent Online Motion Style Transfer [ supp ] Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy [ supp ] Stratified Transformer for 3D Point Cloud Segmentation [ supp ] NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images [ supp ] DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations Task Decoupled Framework for Reference-Based Super-Resolution [ supp ] Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations [ supp ] RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [ supp ] Id-Free Person Similarity Learning [ supp ] Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification Globetrotter: Connecting Languages by Connecting Images [ supp ] Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis [ supp ] Egocentric Scene Understanding via Multimodal Spatial Rectifier [ supp ] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels [ supp ] Day-to-Night Image Synthesis for Training Nighttime Neural ISPs [ supp ] Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data [ supp ]
  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值