超越GPT4V，最强多模态MiniCPM-V2.6模型分享

最新推荐文章于 2024-11-01 09:46:54 发布

置顶杰说新技术

最新推荐文章于 2024-11-01 09:46:54 发布

阅读量1k

点赞数 25

分类专栏： AIGC 多模态文章标签： AIGC 人工智能

本文链接：https://blog.csdn.net/m0_71062934/article/details/141533897

版权

AIGC 同时被 2 个专栏收录

35 篇文章 1 订阅

订阅专栏

多模态

11 篇文章 0 订阅

订阅专栏

MiniCPM-V2.6是由OpenBMB开发的一款多模态大型语言模型（MLLM），专为视觉-语言理解设计。

MiniCPM-V2.6模型能够处理图像、视频和文本输入，并提供高质量的文本输出。

MiniCPM-V 2.6模型在单图像理解方面超越了广泛使用的专有模型，如GPT-4o mini、GPT-4V、Gemini 1.5 Pro和Claude 3.5 Sonnet。

MiniCPM-V 2.6还能够执行多图像理解和上下文学习，并且在Mantis-Eval、BLINK、Mathverse mv和Sciverse mv等流行的多图像基准测试中取得了最先进的性能。

此外，MiniCPM-V 2.6还能够接受视频输入，进行对话并为时空信息提供密集的字幕，性能超过了GPT-4V、Claude 3.5 Sonnet和LLaVA-NeXT-Video-34B。

github项目地址：https://github.com/OpenBMB/MiniCPM-V。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、pip库安装

pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

3、MiniCPM-V-2_6模型下载：

git lfs install

git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

4、MiniCPM-V-2_6-gguf模型下载：

git lfs install

git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf

5、MiniCPM-V-2_6-int4模型下载：

git lfs install

git clone https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4

二、功能测试

1、运行测试：

（1）python代码调用测试

import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
import os

def load_model_and_tokenizer(model_name='OpenBMB/MiniCPM-V-2_6'):
    model = AutoModel.from_pretrained(
        model_name, 
        trust_remote_code=True,
        attn_implementation='sdpa',
        torch_dtype=torch.bfloat16
    ).eval().cuda()
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    return model, tokenizer

def load_image(image_path):
    try:
        with Image.open(image_path).convert('RGB') as image:
            return image
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

def generate_response(model, tokenizer, image, question, sampling=False, stream=False):
    msgs = [{'role': 'user', 'content': [image, question]}]

    res = model.chat(
        image=None,
        msgs=msgs,
        tokenizer=tokenizer,
        sampling=sampling,
        stream=stream
    )
    
    if stream:
        generated_text = ""
        for new_text in res:
            generated_text += new_text
            print(new_text, flush=True, end='')
        return generated_text
    else:
        return res

def main():
    model_name = 'OpenBMB/MiniCPM-V-2_6'
    image_path = 'image.png'
    question = 'What is in the image?'

    if not os.path.exists(image_path):
        print(f"Image path {image_path} does not exist.")
        return
    
    model, tokenizer = load_model_and_tokenizer(model_name)
    image = load_image(image_path)
    
    if image is None:
        return

    response = generate_response(model, tokenizer, image, question)
    print(response)

    # if you want to use streaming
    print("\nStreaming response:")
    generate_response(model, tokenizer, image, question, sampling=True, stream=True)

if __name__ == "__main__":
    main()

未完......

更多详细的欢迎关注：杰哥新技术