Img-Diff: 多模态大型语言模型的对比数据合成

最新推荐文章于 2024-12-23 12:42:36 发布

Phoenixtree_DongZhao

最新推荐文章于 2024-12-23 12:42:36 发布

阅读量1.2k

点赞数 29

分类专栏： Large Model Multi-modal 文章标签：大模型对比学习

本文链接：https://blog.csdn.net/u014546828/article/details/141070466

版权

Img-Diff：

Contrastive Data Synthesis for Multimodal Large Language Models

Arxiv

GitHub

Abstract

High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct components. We utilize the Stable-Diffusion-XL model and advanced image editing techniques to create pairs of similar images that highlight object replacements. Our methodology includes a Difference Area Generator for object differences identifying, followed by a Difference Captions Generator for detaile