MiniCPM-V2.6是由OpenBMB开发的一款多模态大型语言模型(MLLM),专为视觉-语言理解设计。
MiniCPM-V2.6模型能够处理图像、视频和文本输入,并提供高质量的文本输出。
MiniCPM-V 2.6模型在单图像理解方面超越了广泛使用的专有模型,如GPT-4o mini、GPT-4V、Gemini 1.5 Pro和Claude 3.5 Sonnet。
MiniCPM-V 2.6还能够执行多图像理解和上下文学习,并且在Mantis-Eval、BLINK、Mathverse mv和Sciverse mv等流行的多图像基准测试中取得了最先进的性能。
此外,MiniCPM-V 2.6还能够接受视频输入,进行对话并为时空信息提供密集的字幕,性能超过了GPT-4V、Claude 3.5 Sonnet和LLaVA-NeXT-Video-34B。
github项目地址:https://github.com/OpenBMB/MiniCPM-V。
一、环境安装
1、python环境
建议安装python版本在3.10以上。
2、pip库安装
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
3、MiniCPM-V-2_6模型下载:
git lfs install
git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6
4、MiniCPM-V-2_6-gguf模型下载:
git lfs install
git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf
5、MiniCPM-V-2_6-int4模型下载:
git lfs install
git clone https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4
二、功能测试
1、运行测试:
(1)python代码调用测试
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
import os
def load_model_and_tokenizer(model_name='OpenBMB/MiniCPM-V-2_6'):
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
attn_implementation='sdpa',
torch_dtype=torch.bfloat16
).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
return model, tokenizer
def load_image(image_path):
try:
with Image.open(image_path).convert('RGB') as image:
return image
except Exception as e:
print(f"Error loading image: {e}")
return None
def generate_response(model, tokenizer, image, question, sampling=False, stream=False):
msgs = [{'role': 'user', 'content': [image, question]}]
res = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer,
sampling=sampling,
stream=stream
)
if stream:
generated_text = ""
for new_text in res:
generated_text += new_text
print(new_text, flush=True, end='')
return generated_text
else:
return res
def main():
model_name = 'OpenBMB/MiniCPM-V-2_6'
image_path = 'image.png'
question = 'What is in the image?'
if not os.path.exists(image_path):
print(f"Image path {image_path} does not exist.")
return
model, tokenizer = load_model_and_tokenizer(model_name)
image = load_image(image_path)
if image is None:
return
response = generate_response(model, tokenizer, image, question)
print(response)
# if you want to use streaming
print("\nStreaming response:")
generate_response(model, tokenizer, image, question, sampling=True, stream=True)
if __name__ == "__main__":
main()
未完......
更多详细的欢迎关注:杰哥新技术