Phi-3-Vision-128K-Instruct模型的安装与使用教程

最新推荐文章于 2025-01-13 11:57:38 发布

汤垣骥

最新推荐文章于 2025-01-13 11:57:38 发布

阅读量972

点赞数 9

本文链接：https://blog.csdn.net/gitblog_02764/article/details/144420735

版权

Phi-3-Vision-128K-Instruct模型的安装与使用教程

Phi-3-vision-128k-instruct 项目地址: https://gitcode.com/mirrors/Microsoft/Phi-3-vision-128k-instruct

Phi-3-Vision-128K-Instruct 是一款轻量级、先进的开放多模态模型，基于包含合成数据和过滤后的公开可用网站的数据集构建，专注于文本和视觉方面的高质量、推理密集型数据。该模型属于 Phi-3 模型家族，多模态版本支持 128K 上下文长度（以 token 为单位）。模型经过了严格的增强过程，结合了监督微调和直接偏好优化，以确保精确的指令遵循和稳健的安全措施。

安装前准备

系统和硬件要求

操作系统：Linux/Windows/MacOS
硬件：GPU（建议使用 NVIDIA GPU）
硬盘空间：至少 10GB 可用空间

必备软件和依赖项

Python 3.8 或更高版本
Transformers 库（版本 4.40.2 或更高版本）
PyTorch（版本 2.3.0 或更高版本）
Pillow（用于图像处理）
其他依赖项请参考 Phi-3-Vision-128K-Instruct 的官方文档

安装步骤

1. 下载模型资源

Phi-3-Vision-128K-Instruct 模型资源存储在 Hugging Face 平台上，您可以通过以下链接获取：https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

2. 安装过程详解

安装 Python 环境及必备依赖项：
```
pip install -r requirements.txt
```

安装 Transformers 库（确保版本为 4.40.2 或更高）：

pip install git+https://github.com/huggingface/transformers.git

确认 Transformers 版本：
```
pip list | grep transformers
```

使用 from_pretrained() 函数加载 Phi-3-Vision-128K-Instruct 模型：

from transformers import AutoModelForCausalLM
model_id = "microsoft/Phi-3-vision-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

3. 常见问题及解决

无法加载模型：请确保您使用的 Transformers 库版本为 4.40.2 或更高，并且已经正确设置 trust_remote_code=True 参数。
GPU 使用问题：请确保您的 GPU 驱动程序已更新至最新版本，并正确配置 PyTorch 和 CUDA。

基本使用方法

1. 加载模型

如上所述，使用 from_pretrained() 函数加载 Phi-3-Vision-128K-Instruct 模型。

2. 简单示例演示

以下是一个使用 Phi-3-Vision-128K-Instruct 模型进行图像描述的示例：

from PIL import Image
import requests
from transformers import AutoModelForCausalLM, AutoProcessor

model_id = "microsoft/Phi-3-vision-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

url = "https://example.com/path/to/image.jpg"
image = Image.open(requests.get(url, stream=True).raw)

prompt = processor.tokenizer("<|image_1|>\nWhat is shown in this image?", return_tensors="pt")
input_ids = processor.encode(prompt, return_tensors="pt")

output_ids = model.generate(input_ids, max_new_tokens=500, temperature=0.7)
response = processor.decode(output_ids[0], skip_special_tokens=True)

print(response)

3. 参数设置说明

Phi-3-Vision-128K-Instruct 模型支持多种参数设置，例如：

max_new_tokens：生成文本的最大长度（以 token 为单位）
temperature：控制生成文本的随机性，值越小表示生成的文本越接近于训练数据，值越大表示生成的文本越具有随机性
do_sample：是否使用随机采样生成文本，设置为 True 时，会根据温度参数进行随机采样；设置为 False 时，会使用贪婪搜索策略生成文本

结论

本文介绍了 Phi-3-Vision-128K-Instruct 模型的安装与使用方法。Phi-3-Vision-128K-Instruct 是一款功能强大的多模态模型，适用于图像描述、OCR、图表理解等多种场景。开发者可以根据实际需求调整参数设置，发挥模型的潜力。在使用过程中，请遵循负责任的 AI 最佳实践，确保使用场景符合相关法律法规。