【CVPR2022】论文列表与下载——PartTwo

CVPR2022将于6月22日召开🎉🎉🎉,本次会议共收录了2067篇论文。由于数量较多,本文将分四个子文章呈现,可直接点击论文标题获取文档。
📃第一部分, 📃第三部分, 📃 第四部分
在这里插入图片描述

在这里插入图片描述

2. Part Two

Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos [supp]
Learning Canonical F-Correlation Projection for Compact Multiview Representation [supp]
DIFNet: Boosting Visual Information Flow for Image Captioning
Weakly Supervised Object Localization As Domain Adaption [supp]
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation [supp]
Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching [supp]
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation [supp]
Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error [supp]
MatteFormer: Transformer-Based Image Matting via Prior-Tokens [supp]
Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training [supp]
Ranking Distance Calibration for Cross-Domain Few-Shot Learning [supp]
Robust and Accurate Superquadric Recovery: A Probabilistic Approach [supp]
Zero-Shot Text-Guided Object Generation With Dream Fields [supp]
Learning Pixel Trajectories With Multiscale Contrastive Random Walks
Self-Supervised Correlation Mining Network for Person Image Generation
Grounding Answers for Visual Questions Asked by Visually Impaired People [supp]
Task Adaptive Parameter Sharing for Multi-Task Learning [supp]
Sparse Instance Activation for Real-Time Instance Segmentation
Automatic Color Image Stitching Using Quaternion Rank-1 Alignment [supp]
VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning [supp]
ESCNet: Gaze Target Detection With the Understanding of 3D Scenes [supp]
Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection
Finding Badly Drawn Bunnies [supp]
Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders [supp]
All-Photon Polarimetric Time-of-Flight Imaging [supp]
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation [supp]
Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis [supp]
Learning From Temporal Gradient for Semi-Supervised Action Recognition [supp]
Towards Implicit Text-Guided 3D Shape Generation [supp]
Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [supp]
SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage [supp]
Transforming Model Prediction for Tracking [supp]
A Unified Framework for Implicit Sinkhorn Differentiation [supp]
DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation [supp]
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling [supp]
Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning [supp]
A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation [supp]
Query and Attention Augmentation for Knowledge-Based Explainable Reasoning [supp]
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality [supp]
RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion [supp]
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection [supp]
Interactron: Embodied Adaptive Object Detection [supp]
3D Scene Painting via Semantic Image Synthesis [supp]
MeMOT: Multi-Object Tracking With Memory
Revisiting Weakly Supervised Pre-Training of Visual Perception Models [supp]
Semi-Supervised Semantic Segmentation With Error Localization Network
Meta Convolutional Neural Networks for Single Domain Generalization [supp]
Generalizing Gaze Estimation With Rotation Consistency
Anomaly Detection via Reverse Distillation From One-Class Embedding [supp]
Fine-Grained Object Classification via Self-Supervised Pose Alignment [supp]
Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction [supp]
CellTypeGraph: A New Geometric Computer Vision Benchmark [supp]
Clustering Plotted Data by Image Segmentation
Accelerating Neural Network Optimization Through an Automated Control Theory Lens [supp]
Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding [supp]
Learning To Learn Across Diverse Data Biases in Deep Face Recognition [supp]
Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement [supp]
Long-Tail Recognition via Compositional Knowledge Transfer [supp]
EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval [supp]
Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions [supp]
PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments
Self-Taught Metric Learning Without Labels [supp]
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition [supp]
Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization
Embracing Single Stride 3D Object Detector With Sparse Transformer [supp]
Multidimensional Belief Quantification for Label-Efficient Meta-Learning [supp]
UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog
Relieving Long-Tailed Instance Segmentation via Pairwise Class Balance [supp]
Online Convolutional Re-Parameterization [supp]
Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning [supp]
RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding [supp]
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition [supp]
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks
RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior [supp]
Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique [supp]
Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography [supp]
Personalized Image Aesthetics Assessment With Rich Attributes
Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data [supp]
Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification [supp]
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation [supp]
HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging
OW-DETR: Open-World Detection Transformer [supp]
Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds [supp]
Reversible Vision Transformers [supp]
Amodal Panoptic Segmentation [supp]
Gravitationally Lensed Black Hole Emission Tomography [supp]
3D-Aware Image Synthesis via Learning Structural and Textural Representations [supp]
Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer [supp]
Correlation Verification for Image Retrieval [supp]
Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment [supp]
Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer [supp]
PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning [supp]
Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning
Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut [supp]
Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection [supp]
Towards Robust Adaptive Object Detection Under Noisy Annotations [supp]
Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer [supp]
Learning To Memorize Feature Hallucination for One-Shot Image Generation
AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis
Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation [supp]
Glass: Geometric Latent Augmentation for Shape Spaces
COAP: Compositional Articulated Occupancy of People [supp]
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization [supp]
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities [supp]
Deterministic Point Cloud Registration via Novel Transformation Decomposition [supp]
Motion-Adjustable Neural Implicit Video Representation
Neural Prior for Trajectory Estimation [supp]
DPICT: Deep Progressive Image Compression Using Trit-Planes [supp]
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation [supp]
Long-Tailed Recognition via Weight Balancing [supp]
Text to Image Generation With Semantic-Spatial Aware GAN
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization [supp]
ShapeFormer: Transformer-Based Shape Completion via Sparse Representation [supp]
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures [supp]
Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation [supp]
Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization [supp]
Learning Optical Flow With Kernel Patch Attention
Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model [supp]
TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation [supp]
General Incremental Learning With Domain-Aware Categorical Representations [supp]
Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images
ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation [supp]
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers [supp]
Global-Aware Registration of Less-Overlap RGB-D Scans [supp]
RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo [supp]
ContrastMask: Contrastive Learning To Segment Every Thing [supp]
Efficient Deep Embedded Subspace Clustering [supp]
Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture [supp]
Revisiting Temporal Alignment for Video Restoration [supp]
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning [supp]
Neural Reflectance for Shape Recovery With Shadow Handling [supp]
Rep-Net: Efficient On-Device Learning via Feature Reprogramming [supp]
Surface Representation for Point Clouds [supp]
Implicit Motion Handling for Video Camouflaged Object Detection [supp]
OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation [supp]
DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer [supp]
WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery [supp]
Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification [supp]
Optical Flow Estimation for Spiking Camera [supp]
MetaFormer Is Actually What You Need for Vision [supp]
GradViT: Gradient Inversion of Vision Transformers [supp]
Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
InstaFormer: Instance-Aware Image-to-Image Translation With Transformer [supp]
Revisiting Near/Remote Sensing With Geospatial Attention [supp]
Joint Global and Local Hierarchical Priors for Learned Image Compression [supp]
Knowledge Distillation via the Target-Aware Transformer [supp]
Recurring the Transformer for Video Action Recognition [supp]
Subspace Adversarial Training [supp]
3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection [supp]
Image Segmentation Using Text and Image Prompts [supp]
AutoMine: An Unmanned Mine Dataset [supp]
Neural Data-Dependent Transform for Learned Image Compression [supp]
Background Activation Suppression for Weakly Supervised Object Localization [supp]
How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting [supp]
Evaluation-Oriented Knowledge Distillation for Deep Face Recognition
Improving Subgraph Recognition With Variational Graph Information Bottleneck
Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation [supp]
Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos [supp]
Efficient Video Instance Segmentation via Tracklet Query and Proposal [supp]
Synthetic Generation of Face Videos With Plethysmograph Physiology
TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting [supp]
Hallucinated Neural Radiance Fields in the Wild [supp]
NeuralHDHair: Automatic High-Fidelity Hair Modeling From a Single Image Using Implicit Neural Representations [supp]
The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization [supp]
Global Tracking Transformers
Backdoor Attacks on Self-Supervised Learning [supp]
Multimodal Token Fusion for Vision Transformers [supp]
Exploring Frequency Adversarial Attacks for Face Forgery Detection
GMFlow: Learning Optical Flow via Global Matching [supp]
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [supp]
FLAVA: A Foundational Language and Vision Alignment Model [supp]
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production [supp]
Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
OCSampler: Compressing Videos to One Clip With Single-Step Sampling [supp]
Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning
Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction
Scanline Homographies for Rolling-Shutter Plane Absolute Pose [supp]
TableFormer: Table Structure Understanding With Transformers [supp]
Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network
Grounded Language-Image Pre-Training [supp]
Spectral Unsupervised Domain Adaptation for Visual Recognition [supp]
AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement [supp]
PatchFormer: An Efficient Point Transformer With Patch Attention
Recurrent Glimpse-Based Decoder for Detection With Transformer [supp]
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers [supp]
SimMIM: A Simple Framework for Masked Image Modeling [supp]
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion [supp]
Label Matching Semi-Supervised Object Detection [supp]
RegionCLIP: Region-Based Language-Image Pretraining [supp]
Video Frame Interpolation Transformer
An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation [supp]
Fast Light-Weight Near-Field Photometric Stereo [supp]
BCOT: A Markerless High-Precision 3D Object Tracking Benchmark [supp]
Omni-DETR: Omni-Supervised Object Detection With Transformers [supp]
Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching [supp]
High-Resolution Image Synthesis With Latent Diffusion Models [supp]
Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations
Transferable Sparse Adversarial Attack
CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos [supp]
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers [supp]
Text Spotting Transformers
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields [supp]
VALHALLA: Visual Hallucination for Machine Translation [supp]
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation [supp]
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [supp]
GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras [supp]
HINT: Hierarchical Neuron Concept Explainer [supp]
Capturing and Inferring Dense Full-Body Human-Scene Contact [supp]
Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions [supp]
Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection [supp]
En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning [supp]
Neural Face Identification in a 2D Wireframe Projection of a Manifold Object [supp]
LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network [supp]
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation [supp]
Deep Rectangling for Image Stitching: A Learning Baseline [supp]
PCL: Proxy-Based Contrastive Learning for Domain Generalization [supp]
SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation [supp]
Learning 3D Object Shape and Layout Without 3D Supervision [supp]
An Empirical Study of End-to-End Temporal Action Detection [supp]
SimVP: Simpler Yet Better Video Prediction [supp]
Object Localization Under Single Coarse Point Supervision [supp]
Unsupervised Learning of Accurate Siamese Tracking [supp]
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection [supp]
Brain-Supervised Image Editing [supp]
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces [supp]
Unified Transformer Tracker for Object Tracking [supp]
Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo [supp]
Equalized Focal Loss for Dense Long-Tailed Object Detection [supp]
Generating High Fidelity Data From Low-Density Regions Using Diffusion Models [supp]
DeepDPM: Deep Clustering With an Unknown Number of Clusters [supp]
Spiking Transformers for Event-Based Single Object Tracking [supp]
FocalClick: Towards Practical Interactive Image Segmentation
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation [supp]
Unsupervised Domain Adaptation for Nighttime Aerial Tracking [supp]
Balanced Multimodal Learning via On-the-Fly Gradient Modulation [supp]
RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs [supp]
Understanding Uncertainty Maps in Vision With Statistical Testing [supp]
CAFE: Learning To Condense Dataset by Aligning Features
Causality Inspired Representation Learning for Domain Generalization [supp]
Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction
A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency
PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation [supp]
Block-NeRF: Scalable Large Scene Neural View Synthesis [supp]
Coupling Vision and Proprioception for Navigation of Legged Robots [supp]
Fine-Grained Predicates Learning for Scene Graph Generation
Generalized Few-Shot Semantic Segmentation [supp]
Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation [supp]
Neural Head Avatars From Monocular RGB Videos [supp]
B-Cos Networks: Alignment Is All We Need for Interpretability [supp]
EMOCA: Emotion Driven Monocular Face Capture and Animation [supp]
Burst Image Restoration and Enhancement [supp]
What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors [supp]
Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis [supp]
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [supp]
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis [supp]
Localized Adversarial Domain Generalization [supp]
X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning [supp]
How Much Does Input Data Type Impact Final Face Model Accuracy? [supp]
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [supp]
HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video [supp]
PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
Which Images To Label for Few-Shot Medical Landmark Detection?
Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis [supp]
Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention [supp]
AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation [supp]
Self-Distillation From the Last Mini-Batch for Consistency Regularization
Interactive Multi-Class Tiny-Object Detection [supp]
Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection [supp]
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection [supp]
Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry [supp]
Learning To Collaborate in Decentralized Learning of Personalized Models
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields [supp]
ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation [supp]
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields [supp]
360-Attack: Distortion-Aware Perturbations From Perspective-Views
Targeted Supervised Contrastive Learning for Long-Tailed Recognition [supp]
Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding
Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition [supp]
Balanced Contrastive Learning for Long-Tailed Visual Recognition [supp]
Slimmable Domain Adaptation [supp]
Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees
NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration [supp]
DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow [supp]
Few-Shot Object Detection With Fully Cross-Transformer [supp]
Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation
Decoupling Makes Weakly Supervised Local Feature Better [supp]
Cross-Architecture Self-Supervised Video Representation Learning
High-Resolution Image Harmonization via Collaborative Dual Transformations [supp]
Homography Loss for Monocular 3D Object Detection
A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors [supp]
Dynamic Sparse R-CNN
MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [supp]
Stable Long-Term Recurrent Video Super-Resolution [supp]
Dual-Generator Face Reenactment
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
Self-Supervised Neural Articulated Shape and Appearance Models [supp]
A Hybrid Quantum-Classical Algorithm for Robust Fitting [supp]
Topology Preserving Local Road Network Estimation From Single Onboard Camera Image [supp]
Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes [supp]
Human Instance Matting via Mutual Guidance and Multi-Instance Refinement [supp]
TCTrack: Temporal Contexts for Aerial Tracking [supp]
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing [supp]
GAN-Supervised Dense Visual Alignment [supp]
SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition [supp]
Multi-Level Feature Learning for Contrastive Multi-View Clustering
RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
iPLAN: Interactive and Procedural Layout Planning [supp]
Video Frame Interpolation With Transformer [supp]
GIFS: Neural Implicit Function for General Shape Representation [supp]
Deblur-NeRF: Neural Radiance Fields From Blurry Images [supp]
Egocentric Prediction of Action Target in 3D [supp]
TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates [supp]
Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering [supp]
Towards Real-World Navigation With Deep Differentiable Planners [supp]
An Iterative Quantum Approach for Transformation Estimation From Point Sets [supp]
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation [supp]
UnweaveNet: Unweaving Activity Stories [supp]
Balanced MSE for Imbalanced Visual Regression [supp]
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [supp]
PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer
Dimension Embeddings for Monocular 3D Object Detection
Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator [supp]
NeRFReN: Neural Radiance Fields With Reflections [supp]
Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel [supp]
Finding Good Configurations of Planar Primitives in Unorganized Point Clouds [supp]
PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images [supp]
SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization [supp]
Beyond Fixation: Dynamic Window Visual Transformer
Progressive End-to-End Object Detection in Crowded Scenes [supp]
FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification [supp]
Improving GAN Equilibrium by Raising Spatial Awareness [supp]
Neural Convolutional Surfaces [supp]
HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet [supp]
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes [supp]
ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes [supp]
Source-Free Domain Adaptation via Distribution Estimation [supp]
Robust Combination of Distributed Gradients Under Adversarial Perturbations [supp]
Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network
VisCUIT: Visual Auditor for Bias in CNN Image Classifier
Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis [supp]
Transferability Estimation Using Bhattacharyya Class Separability [supp]
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Hierarchical Self-Supervised Representation Learning for Movie Understanding
Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality
Does Robustness on ImageNet Transfer to Downstream Tasks? [supp]
Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples [supp]
Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory [supp]
Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations [supp]
Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection [supp]
Proto2Proto: Can You Recognize the Car, the Way I Do? [supp]
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation [supp]
Learning Video Representations of Human Motion From Synthetic Data [supp]
TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
FS6D: Few-Shot 6D Pose Estimation of Novel Objects [supp]
Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale [supp]
The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions [supp]
Vision-Language Pre-Training for Boosting Scene Text Detectors
Reflection and Rotation Symmetry Detection via Equivariant Learning [supp]
BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation
Simple but Effective: CLIP Embeddings for Embodied AI [supp]
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition [supp]
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Collaborative Transformers for Grounded Situation Recognition [supp]
DyRep: Bootstrapping Training With Dynamic Re-Parameterization [supp]
Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection [supp]
CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild [supp]
Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition [supp]
Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations [supp]
CDGNet: Class Distribution Guided Network for Human Parsing [supp]
Recall@k Surrogate Loss With Large Batches and Similarity Mixup [supp]
Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction [supp]
Continual Test-Time Domain Adaptation [supp]
URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement [supp]
Towards Multi-Domain Single Image Dehazing via Test-Time Training
Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks [supp]
Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase [supp]
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information [supp]
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network [supp]
ScanQA: 3D Question Answering for Spatial Scene Understanding [supp]
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering [supp]
Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation [supp]
Learning Program Representations for Food Images and Cooking Recipes
Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport [supp]
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering [supp]
Federated Learning With Position-Aware Neurons [supp]
Fair Contrastive Learning for Facial Attribute Classification [supp]
MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design [supp]
BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras [supp]
RGB-Depth Fusion GAN for Indoor Depth Completion [supp]
Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer
RCL: Recurrent Continuous Localization for Temporal Action Detection [supp]
C2SLR: Consistency-Enhanced Continuous Sign Language Recognition [supp]
Human Trajectory Prediction With Momentary Observation
FoggyStereo: Stereo Matching With Fog Volume Representation [supp]
Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video [supp]
Directional Self-Supervised Learning for Heavy Image Augmentations [supp]
Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation [supp]
No-Reference Point Cloud Quality Assessment via Domain Adaptation
Generating Representative Samples for Few-Shot Classification [supp]
Comprehending and Ordering Semantics for Image Captioning
Dynamic Scene Graph Generation via Anticipatory Pre-Training
A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection [supp]
GaTector: A Unified Framework for Gaze Object Prediction [supp]
ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding [supp]
CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows [supp]
LaTr: Layout-Aware Transformer for Scene-Text VQA [supp]
Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification [supp]
ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks [supp]
Enhancing Face Recognition With Self-Supervised 3D Reconstruction
HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction [supp]
Reduce Information Loss in Transformers for Pluralistic Image Inpainting [supp]
Replacing Labeled Real-Image Datasets With Auto-Generated Contours
Cross-Modal Transferable Adversarial Attacks From Images to Videos [supp]
Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection [supp]
Do Explanations Explain? Model Knows Best [supp]
WebQA: Multihop and Multimodal QA [supp]
Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture [supp]
BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment [supp]
IDR: Self-Supervised Image Denoising via Iterative Data Refinement [supp]
MogFace: Towards a Deeper Appreciation on Face Detection [supp]
GuideFormer: Transformers for Image Guided Depth Completion [supp]
Multi-Label Iterated Learning for Image Classification With Label Ambiguity [supp]
Region-Aware Face Swapping
Towards Language-Free Training for Text-to-Image Generation [supp]
Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers [supp]
Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees [supp]
Physical Simulation Layer for Accurate 3D Modeling [supp]
Deformable Sprites for Unsupervised Video Decomposition [supp]
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation [supp]
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos [supp]
Learning To Detect Mobile Objects From LiDAR Scans Without Labels [supp]
BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion [supp]
Probabilistic Representations for Video Contrastive Learning [supp]
EnvEdit: Environment Editing for Vision-and-Language Navigation [supp]
Omnivore: A Single Model for Many Visual Modalities [supp]
Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors
Reflash Dropout in Image Super-Resolution [supp]
WildNet: Learning Domain Generalized Semantic Segmentation From the Wild [supp]
Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage [supp]
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
DECORE: Deep Compression With Reinforcement Learning
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving [supp]
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection [supp]
Task Discrepancy Maximization for Fine-Grained Few-Shot Classification [supp]
FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction [supp]
Efficient Classification of Very Large Images With Tiny Objects [supp]
SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization [supp]
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [supp]
Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers [supp]
Generating Diverse 3D Reconstructions From a Single Occluded Face Image [supp]
RBGNet: Ray-Based Grouping for 3D Object Detection [supp]
Stand-Alone Inter-Frame Attention in Video Models
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [supp]
Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources [supp]
Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening
Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer [supp]
Large-Scale Pre-Training for Person Re-Identification With Noisy Labels [supp]
Adiabatic Quantum Computing for Multi Object Tracking [supp]
Feature Erasing and Diffusion Network for Occluded Person Re-Identification
Is Mapping Necessary for Realistic PointGoal Navigation? [supp]
Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification
Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting [supp]
Masked Feature Prediction for Self-Supervised Visual Pre-Training [supp]
Critical Regularizations for Neural Surface Reconstruction in the Wild [supp]
EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning [supp]
Object-Relation Reasoning Graph for Action Recognition
Semantic Segmentation by Early Region Proxy [supp]
GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation [supp]
Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers
FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset [supp]
Bring Evanescent Representations to Life in Lifelong Class Incremental Learning [supp]
Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data [supp]
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition [supp]
SimVQA: Exploring Simulated Environments for Visual Question Answering [supp]
Thin-Plate Spline Motion Model for Image Animation [supp]
Learning Local Displacements for Point Cloud Completion [supp]
Human Hands As Probes for Interactive Object Understanding [supp]
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training [supp]
Certified Patch Robustness via Smoothed Vision Transformers [supp]
Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling
UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation
HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture [supp]
RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising [supp]
Rethinking Visual Geo-Localization for Large-Scale Applications [supp]
Learning Based Multi-Modality Image and Video Compression [supp]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值