2023-Inpaint Anything: Segment Anything Meets Image Inpainting

最新推荐文章于 2025-05-27 14:37:52 发布

WX Chen

最新推荐文章于 2025-05-27 14:37:52 发布

阅读量747

点赞数

CC 4.0 BY-SA版权

文章标签：人工智能

本文链接：https://blog.csdn.net/kl1411/article/details/132034528

文章介绍了Inpaint-Anything模型，一种基于SAM的图像修复技术，无需精细掩码，支持多种图像修补应用。此外，文中还提到了与语言预训练结合的GroundingDINO和用于开放集对象检测的SegmentAnything。技术细节包括Transformer、LaMa、FourierConvolution和Gradio在图像处理中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

代码

解决 If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or confi...
https://blog.csdn.net/weixin_38130913/article/details/104604587
https://github.com/geekyutao/Inpaint-Anything

利用文本描述替换万物（Inpaint-Anything-Description）

通过人工选择point交互，生成效率比较低，因此假期开发了Inpaint-Anything-Description，通过输入原始图片、图中需要替换的目标文本描述及替换后目标的文本描述即可。

python -m pip install -e segment_anything

export CUDA_HOME=/usr/local/cuda
python -m pip install -e GroundingDINO
https://github.com/IDEA-Research/Grounded-Segment-Anything/issues/167
修改代码
https://github.com/THUDM/ChatGLM-6B/issues/50
https://blog.csdn.net/qq_41994006/article/details/130500086
代码
https://github.com/Atlas-wuu/Inpaint-Anything-Description

模型下载
https://huggingface.co/camenduru/big-lama/tree/main

解读

基于 SAM，提出「修补一切」（Inpaint Anything，简称 IA）模型。
区别于传统图像修补模型，IA 模型无需精细化操作生成掩码，支持了一键点击标记选定对象，IA 即可实现移除一切物体（Remove Anything）、填补一切内容（Fill Anything）、替换一切场景（Replace Anything），涵盖了包括目标移除、目标填充、背景替换等在内的多种典型图像修补应用场景。

基于 SAM，研究者首次尝试无需掩码（Mask-Free）图像修复，并构建了「点击再填充」（Clicking and Filling）的图像修补新范式，将其称为修补一切（Inpaint Anything）（IA）。

IA 结合了 SAM、图像修补模型（例如 LaMa）和 AIGC 模型（例如 Stable Diffusion）等视觉基础模型，实现了对用户操作友好的无掩码化图像修复，同时支持「点击删除，提示填充」的等「傻瓜式」人性化操作。
https://blog.csdn.net/amusi1994/article/details/130279100

2023-Grounding dino: Marrying dino with grounded pre-training for open-set object detection

将基于Transformer的检测器DINO与真值预训练相结合。

开集检测关键是引入language至闭集检测器，用于开集概念泛化。
https://blog.csdn.net/qq_41994006/article/details/130168808

2023-CVPR-Segment Anything

代码
https://github.com/facebookresearch/segment-anything

介绍
https://www.datalearner.com/blog/1051680736366178

解读

SAM在设计上可以同时输入原图和特定提示（点、框、阴影、文本），然后根据不同的提示输出不同的分割结果图，并且SAM支持不同提示的交互式分割。

SAM模型由图片编码器模块（image encoder）、提示信息编码器模块（prompt encoder）、分割mask解码器模块（mask decoder）3部分组成。

图片编码器模块基于Vision Transformer (ViT)主干网络实现。

提示信息编码器模块支持稀疏特征点、框、文本 (points, boxes, text) 和稠密特征阴影 (masks)。

分割mask解码器模块采用Transformer的解码器部分实现，并在后面接入动态的头部预测模块。
https://blog.csdn.net/qq_14845119/article/details/130628575

2022-CVPR-High-Resolution Image Synthesis with Latent Diffusion Models

代码
https://github.com/CompVis/stable-diffusion

相比于其它进行压缩的方法，本文的方法可以生成更细致的图像，并且在高分辨率（风景图之类的，最高达[Math Processing Error]都无压力）的生成也表现得很好。
https://zhuanlan.zhihu.com/p/519179626

2022-WACV-LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions

解读
https://blog.csdn.net/qq_14845119/article/details/122734750
https://zhuanlan.zhihu.com/p/573505798
https://blog.csdn.net/qq_42951560/article/details/126528156

试玩地址：https://huggingface.co/spaces/akhaliq/lama
https://blog.csdn.net/weixin_47196664/article/details/120838103

2020-NeurIPS-Fast fourier convolution
https://zhuanlan.zhihu.com/p/358187931
Fast Fourier Convolution
https://blog.csdn.net/m0_55780358/article/details/128039641
Fast Fourier Convolution — A detailed view - Medium
https://medium.com/mlearning-ai/fast-fourier-convolution-a-detailed-view-a5149aae36c4

PyTorch 中的傅立叶卷积：通过 FFT 有效计算大核卷积的数学原理和代码实现
https://zhuanlan.zhihu.com/p/300603589

快速傅立叶变换
https://zhuanlan.zhihu.com/p/149528521

===============================================================================
gradio学习
https://www.gradio.app/docs/image

Gradio还提供了一个独特的URL/公共链接，使人们能够访问你的应用程序。这个链接是https://51358.gradio.app
https://juejin.cn/post/7167333522154192909