CVPR2024部分研究方向文章梳理(持续更新中)
长尾分布(Long-Tailed)
- DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets.
全文地址:DeiT-LT \(rangwani-harsh.github.io\)
领域自适应(Domain Adaptation)
- Learning CNN on ViT:A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation.
全文地址:
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation | Project Page (dotrannhattuong.github.io) - Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
视觉-语言预训练(Vision-language Pretraining)
- A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models.
全文地址:CLAP (jusiro.github.io) - Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
- FairCLIP: Harnessing Fairness in Vision-Language Learning
- Efficient Test-Time Adaptation of Vision-Language Models
- ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
- Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
- Transductive Zero-Shot && Few-Shot CLIP
- LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
- PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
- Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
- PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
- JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
- Label Propagation for Zero-shot Classification with Vision-Language Models
- ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
- Active Prompt Learning in Vision Language Models
多模态大模型(Large Multimodal Models)
- ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
全文地址:ViP-LLaVA - Generative Multimodal Models are In-Context Learners
- Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
全文地址:Q-Instruct | ②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints. (q-future.github.io)
少样本学习(Few-shot Learning)
- Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
- Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
- OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
- Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
- Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
- Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
- Large Language Models are Good Prompt Learners for Low-Shot Image Classification
扩散模型(Diffusion Model)
- GenTron: Diffusion Transformers for Image and Video Generation。
- DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
全文地址:DiffuseMix - Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
全文地址:Lodge (li-ronghui.github.io) - TokenCompose: Text-to-Image Diffusion with Token-level Supervision
全文地址:TokenCompose: Grounding Diffusion with Token-level Supervision (mlpc-ucsd.github.io) - FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
- LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
- FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
- Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
- CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
- Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
- L-MAGIC: Language Model Assisted Generation of Images with Consistency
- InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
类增量学习(Class Incremental Learning)
- Class Incremental Learning with Multi-Teacher Distillation
- Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning
- Generative Multi-modal Models are Good Class Incremental Learners
- Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
噪声标签(Noisy Label Learning)
- Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning