[All-in-one] 多模态提示感知器: 赋予适应性，通用性和保真度为一体的图像恢复

Phoenixtree_DongZhao

已于 2024-08-17 10:40:26 修改

阅读量1.3k

点赞数 15

分类专栏： MyDLNote-Enhancement Multi-modal 文章标签：深度学习图像修复

于 2024-08-17 10:39:08 首次发布

本文链接：https://blog.csdn.net/u014546828/article/details/141276388

版权

Multimodal Prompt Perceiver:

Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

https://arxiv.org/pdf/2312.02918

GitHub

Abstract

Abstract Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail representation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, enabling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module improves restoration fidelity via direct encoder-to-decoder information transformation. To assess our method, MPerceiver is trained on 9 tasks for all-in-one IR and out performs state-of-the-art task-specific methods across most tasks. Post multitask pre-training, MPerceiver attains a generalized representation in low-level vision, exhibiting remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive experiments on 16 IR tasks unde