ECCV2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)
- Awesome-ECCV2024-AIGC
- 1.图像生成(Image Generation/Image Synthesis)
- Accelerating Diffusion Sampling with Optimized Time Steps
- AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
- A Watermark-Conditioned Diffusion Model for IP Protection
- BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
- ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
- Data Augmentation for Saliency Prediction via Latent Diffusion
- Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
- DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
- DiffiT: Diffusion Vision Transformers for Image Generation
- Large-scale Reinforcement Learning for Diffusion Models
- MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
- Memory-Efficient Fine-Tuning for Quantized Diffusion Model
- OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
- Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
- 2.图像编辑(Image Editing)
- A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
- BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
- FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
- StableDrag: Stable Dragging for Point-based Image Editing
- TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
- 3.视频生成(Video Generation/Video Synthesis)
- Audio-Synchronized Visual Animation
- Dyadic Interaction Modeling for Social Behavior Generation
- EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
- FreeInit : Bridging Initialization Gap in Video Diffusion Models
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
- ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
- GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
- GVGEN:Text-to-3D Generation with Volumetric Representation
- Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
- ParCo: Part-Coordinating Text-to-Motion Synthesis
- Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Models)
- An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
- ControlCap: Controllable Region-level Captioning
- DriveLM: Driving with Graph Visual Question Answering
- Elysium: Exploring Object-level Perception in Videos via MLLM
- Empowering Multimodal Large Language Model as a Powerful Data Generator
- GiT: Towards Generalist Vision Transformer through Universal Language Interface
- How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
- Long-CLIP: Unlocking the Long-Text Capability of CLIP
- MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
- Merlin:Empowering Multimodal LLMs with Foresight Minds
- Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
- MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
- PointLLM: Empowering Large Language Models to Understand Point Clouds
- R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
- ST-LLM: Large Language Models Are Effective Temporal Learners
- TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
- UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
- 8.其他任务(Others)
- 参考
- 相关整理
Awesome-ECCV2024-AIGC
A Collection of Papers and Codes for ECCV2024 AIGC
整理汇总下2024年ECCV AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
优先在Github更新:Awesome-ECCV2024-AIGC,欢迎star~
知乎:https://zhuanlan.zhihu.com/p/706699484
参考或转载请注明出处
ECCV2024官网:https://eccv.ecva.net/
ECCV接收论文列表:
ECCV完整论文库:
开会时间:2024年9月29日-10月4日
论文接收公布时间:2024年
【Contents】
- 1.图像生成(Image Generation/Image Synthesis)
- 2.图像编辑(Image Editing)
- 3.视频生成(Video Generation/Image Synthesis)
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Model)
- 8.其他多任务(Others)
1.图像生成(Image Generation/Image Synthesis)
Accelerating Diffusion Sampling with Optimized Time Steps
- Paper: https://arxiv.org/abs/2402.17376
- Code: https://github.com/scxue/DM-NonUniform
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
- Paper: https://arxiv.org/abs/2406.18958
- Code: https://github.com/open-mmlab/AnyControl
A Watermark-Conditioned Diffusion Model for IP Protection
- Paper:
- Code: https://github.com/rmin2000/WaDiff
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
- Paper: https://arxiv.org/abs/2404.04544
- Code: https://github.com/gwang-kim/BeyondScene
ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
- Paper: https://arxiv.org/abs/2402.11849
- Code:
Data Augmentation for Saliency Prediction via Latent Diffusion
- Paper:
- Code: https://github.com/IVRL/AugSal
Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
- Paper: https://arxiv.org/abs/2310.17316
- Code: https://github.com/EnVision-Research/Defect_Spectrum
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
- Paper:
- Code: https://github.com/murphytju/DiffFAS
DiffiT: Diffusion Vision Transformers for Image Generation
- Paper: https://arxiv.org/abs/2312.02139
- Code: https://github.com/NVlabs/DiffiT
Large-scale Reinforcement Learning for Diffusion Models
- Paper: https://arxiv.org/abs/2401.12244
- Code: https://github.com/pinterest/atg-research/tree/main/joint-rl-diffusion
MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
- Paper: https://arxiv.org/abs/2405.05806
- Code: https://github.com/csyxwei/MasterWeaver
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
- Paper:
- Code: https://github.com/ugonfor/TuneQDM
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
- Paper: https://arxiv.org/abs/2403.10983
- Code: https://github.com/kongzhecn/OMG
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
- Paper: https://arxiv.org/abs/2403.09176
- Code: https://github.com/byeongjun-park/Switch-DiT
2.图像编辑(Image Editing)
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
- Paper: https://arxiv.org/abs/2312.03594
- Code: https://github.com/open-mmlab/PowerPaint
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
- Paper: https://arxiv.org/abs/2403.06976
- Code: https://github.com/TencentARC/BrushNet
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
- Paper:
- Code: https://github.com/kookie12/FlexiEdit
StableDrag: Stable Dragging for Point-based Image Editing
- Paper: https://arxiv.org/abs/2403.04437
- Code:
TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
- Paper: https://arxiv.org/abs/2403.15033
- Code: https://github.com/TinyBeauty/TinyBeauty
3.视频生成(Video Generation/Video Synthesis)
Audio-Synchronized Visual Animation
- Paper: https://arxiv.org/abs/2403.05659
- Code: https://github.com/lzhangbj/ASVA
Dyadic Interaction Modeling for Social Behavior Generation
- Paper: https://arxiv.org/abs/2403.09069
- Code: https://github.com/Boese0601/Dyadic-Interaction-Modeling
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
- Paper: https://arxiv.org/abs/2404.01647
- Code: https://github.com/tanshuai0219/EDTalk
FreeInit : Bridging Initialization Gap in Video Diffusion Models
- Paper: https://arxiv.org/abs/2312.07537
- Code: https://github.com/TianxingWu/FreeInit
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
- Paper: https://arxiv.org/abs/2405.20222
- Code: https://github.com/MyNiuuu/MOFA-Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
- Paper: https://arxiv.org/abs/2310.01324
- Code:
4.视频编辑(Video Editing)
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
- Paper: https://arxiv.org/abs/2403.13745
- Code: https://github.com/G-U-N/Be-Your-Outpainter
DragAnything: Motion Control for Anything using Entity Representation
- Paper: https://arxiv.org/abs/2403.07420
- Code: https://github.com/showlab/DragAnything
5.3D生成(3D Generation/3D Synthesis)
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
- Paper: https://arxiv.org/abs/2405.00915
- Code: https://github.com/ymxlzgy/echoscene
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
- Paper: https://arxiv.org/abs/2405.00915
- Code: https://github.com/ibrahimethemhamamci/GenerateCT
GVGEN:Text-to-3D Generation with Volumetric Representation
- Paper:
- Code: https://github.com/SOTAMak1r/GVGEN
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
- Paper: https://arxiv.org/abs/2403.07487
- Code: https://github.com/steve-zeyu-zhang/MotionMamba
ParCo: Part-Coordinating Text-to-Motion Synthesis
- Paper: https://arxiv.org/abs/2403.18512
- Code: https://github.com/qrzou/ParCo
Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
- Paper: https://arxiv.org/abs/2311.17050
- Code: https://github.com/Yzmblog/SurfD
6.3D编辑(3D Editing)
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
- Paper: https://arxiv.org/abs/2312.00732
- Code: https://github.com/lkeab/gaussian-grouping
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
- Paper: https://arxiv.org/abs/2403.18512
- Code: https://github.com/JarrentWu1031/SC4D
Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
- Paper: https://arxiv.org/abs/2403.10050
- Code: https://github.com/slothfulxtx/Texture-GS
7.多模态大语言模型(Multi-Modal Large Language Models)
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
- Paper: https://arxiv.org/abs/2403.06764
- Code: https://github.com/pkunlp-icler/FastV
ControlCap: Controllable Region-level Captioning
- Paper: https://arxiv.org/abs/2401.17910
- Code: https://github.com/callsys/ControlCap
DriveLM: Driving with Graph Visual Question Answering
- Paper: https://arxiv.org/abs/2312.14150
- Code: https://github.com/OpenDriveLab/DriveLM
Elysium: Exploring Object-level Perception in Videos via MLLM
- Paper: https://arxiv.org/abs/2403.16558
- Code: https://github.com/Hon-Wong/Elysium
Empowering Multimodal Large Language Model as a Powerful Data Generator
- Paper:
- Code: https://github.com/zhaohengyuan1/Genixer
GiT: Towards Generalist Vision Transformer through Universal Language Interface
- Paper: https://arxiv.org/abs/2403.09394
- Code: https://github.com/Haiyang-W/GiT
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
- Paper: https://arxiv.org/abs/2311.17600
- Code: https://github.com/UCSC-VLAA/vllm-safety-benchmark
Long-CLIP: Unlocking the Long-Text Capability of CLIP
- Paper: https://arxiv.org/abs/2403.15378
- Code: https://github.com/beichenzbc/Long-CLIP
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
- Paper: https://arxiv.org/abs/2403.14624
- Code: https://github.com/ZrrSkywalker/MathVerse
Merlin:Empowering Multimodal LLMs with Foresight Minds
- Paper: https://arxiv.org/abs/2312.00589
- Code: https://github.com/Ahnsun/merlin
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
- Paper: https://arxiv.org/abs/2403.11755
- Code: https://github.com/jmiemirza/Meta-Prompting
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
- Paper: https://arxiv.org/abs/2403.14624
- Code: https://github.com/isXinLiu/MM-SafetyBench
PointLLM: Empowering Large Language Models to Understand Point Clouds
- Paper: https://arxiv.org/abs/2308.16911
- Code: https://github.com/OpenRobotLab/PointLLM
R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
- Paper: https://arxiv.org/abs/2403.04924
- Code: https://github.com/lxa9867/r2bench
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
- Paper:
- Code: https://github.com/AI-Application-and-Integration-Lab/SAM4MLLM
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
- Paper: https://arxiv.org/abs/2311.12793
- Code: https://github.com/ShareGPT4Omni/ShareGPT4V
ST-LLM: Large Language Models Are Effective Temporal Learners
- Paper: https://arxiv.org/abs/2404.00308
- Code: https://github.com/TencentARC/ST-LLM
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
- Paper: https://arxiv.org/abs/2404.00384
- Code: https://github.com/shjo-april/TTD
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
- Paper: https://arxiv.org/abs/2311.17136
- Code: https://github.com/TIGER-AI-Lab/UniIR
8.其他任务(Others)
持续更新~