使用llama3.2-vision进行图片信息识别

环境准备

创建Python虚拟环境

conda create -n ollama python=3.10

安装依赖包

pip install ollama

说明:

  • 安装时,需要清除设置的国内源。
  • Windows下的国内源路径C:\Users\{user_name}\AppData\Roaming\pip
  • Ubuntu下的pip国内源路径/home/ubuntu/.config/pip
  • 需要配置环境,可以访问Google。

安装过程演示如下:

(ollama_service) ubuntu@mySvr:~$ pip install ollama
Collecting ollama
  Downloading ollama-0.4.2-py3-none-any.whl.metadata (4.8 kB)
Collecting httpx<0.28.0,>=0.27.0 (from ollama)
  Using cached httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting pydantic<3.0.0,>=2.9.0 (from ollama)
  Downloading pydantic-2.10.3-py3-none-any.whl.metadata (172 kB)
Collecting anyio (from httpx<0.28.0,>=0.27.0->ollama)
  Downloading anyio-4.6.2.post1-py3-none-any.whl.metadata (4.7 kB)
Collecting certifi (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Collecting idna (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting sniffio (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama)
  Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3.0.0,>=2.9.0->ollama)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.27.1 (from pydantic<3.0.0,>=2.9.0->ollama)
  Downloading pydantic_core-2.27.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting typing-extensions>=4.12.2 (from pydantic<3.0.0,>=2.9.0->ollama)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting exceptiongroup>=1.0.2 (from anyio->httpx<0.28.0,>=0.27.0->ollama)
  Using cached exceptiongroup-1.2.2-py3-none-any.whl.metadata (6.6 kB)
Downloading ollama-0.4.2-py3-none-any.whl (13 kB)
Using cached httpx-0.27.2-py3-none-any.whl (76 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)
Downloading pydantic_core-2.27.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 6.9 MB/s eta 0:00:00
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading anyio-4.6.2.post1-py3-none-any.whl (90 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached certifi-2024.8.30-py3-none-any.whl (167 kB)
Using cached exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: typing-extensions, sniffio, idna, h11, exceptiongroup, certifi, annotated-types, pydantic-core, httpcore, anyio, pydantic, httpx, ollama
Successfully installed annotated-types-0.7.0 anyio-4.6.2.post1 certifi-2024.8.30 exceptiongroup-1.2.2 h11-0.14.0 httpcore-1.0.7 httpx-0.27.2 idna-3.10 ollama-0.4.2 pydantic-2.10.3 pydantic-core-2.27.1 sniffio-1.3.1 typing-extensions-4.12.2
(ollama_service) ubuntu@mySvr:~$ 

安装后的依赖包信息如下:

(ollama_service) ubuntu@mySvr:~$ pip freeze
annotated-types==0.7.0
anyio==4.6.2.post1
certifi==2024.8.30
exceptiongroup==1.2.2
h11==0.14.0
httpcore==1.0.7
httpx==0.27.2
idna==3.10
ollama==0.4.2
pydantic==2.10.3
pydantic_core==2.27.1
sniffio==1.3.1
typing_extensions==4.12.2
(ollama_service) ubuntu@mySvr:~$ 

模型下载和运行

ollama run llama3.2-vision

演示过程如下:

(base) ubuntu@mySvr:~$ ollama run llama3.2-vision
pulling manifest 
pulling 11f274007f09... 100% ▕██████████████████████████████████████████████████████▏ 6.0 GB                         
pulling ece5e659647a... 100% ▕██████████████████████████████████████████████████████▏ 1.9 GB                         
pulling 715415638c9c... 100% ▕██████████████████████████████████████████████████████▏  269 B                         
pulling 0b4284c1f870... 100% ▕██████████████████████████████████████████████████████▏ 7.7 KB                         
pulling fefc914e46e6... 100% ▕██████████████████████████████████████████████████████▏   32 B                         
pulling fbd313562bb7... 100% ▕██████████████████████████████████████████████████████▏  572 B                         
verifying sha256 digest 
writing manifest 
success 
>>> 

测试模型

>>> How many state in USA?
There are 50 states in the United States of America.

>>> 美国有多少个州?
美国有 50 个州。

>>> 中国的首都是哪里?
中国的首都是北京(Beijing)。

>>> 帮我写一个50字的故事,关于猴子的
有一只名叫毛毛的猴子,住在树顶上的家里。毛毛喜欢跳舞,经常会随着音乐跳起来。有一天,风暴来临,树枝开始摇晃,毛毛害
怕。然而,他想起了自己爱好的舞蹈,于是开始跳舞,舞动的身体使他稳定在树上,风暴也最终过去了。

>>> Send a message (/? for help)

模型测试

程序代码1

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['/home/ubuntu/1.png']
    }]
)

print(response)

测试结果1

运行程序,返回结果如下:

(ollama_service) (base) ubuntu@mySvr:~/code/ollama_service$ /home/ubuntu/miniconda3/envs/ollama_service/bin/python /home/ubuntu/code/ollama_service/main.py
model='llama3.2-vision' created_at='2024-12-05T01:38:43.918812229Z' done=True done_reason='stop' total_duration=2291537793 load_duration=23306166 prompt_eval_count=18 prompt_eval_duration=1059000000 eval_count=80 eval_duration=988000000 message=Message(role='assistant', content='The image depicts a golden dragon figurine, featuring intricate scales and detailing. The dragon appears to be made of gold or metal and has its mouth open as if roaring. It is sitting on an ornate base that seems to be made of the same material. The overall effect is one of grandeur and majesty, suggesting that this might be a decorative piece for a wealthy collector or enthusiast.', images=None, tool_calls=None)
(ollama_service) (base) ubuntu@mySvr:~/code/ollama_service$ 

其中Content内容为The image depicts a golden dragon figurine, featuring intricate scales and detailing. The dragon appears to be made of gold or metal and has its mouth open as if roaring. It is sitting on an ornate base that seems to be made of the same material. The overall effect is one of grandeur and majesty, suggesting that this might be a decorative piece for a wealthy collector or enthusiast.

使用百度翻译后的结果为:

这幅图像描绘了一个金龙雕像,具有复杂的鳞片和细节。龙似乎是由黄金或金属制成的,它张着嘴,好像在咆哮。它坐在一个华丽的底座上,底座似乎也是由同样的材料制成的。整体效果是宏伟和威严,这表明这可能是一件富有的收藏家或爱好者的装饰品。

测试图片为

在这里插入图片描述

程序代码2

from ollama import Client

client = Client(
  host='http://192.168.0.160:12345',
  headers={'x-some-header': 'some-value'}
)

response = client.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['/home/ubuntu/test.png']
    }]
)

print(response)

测试结果2

(ollama_service) (base) ubuntu@ubuntu-server:~/code/ollama_service$ /home/ubuntu/miniconda3/envs/ollama_service/bin/python /home/ubuntu/code/ollama_service/demo_rest.py
model='llama3.2-vision' created_at='2024-12-05T02:31:55.446849548Z' done=True done_reason='stop' total_duration=1700043974 load_duration=14954191 prompt_eval_count=18 prompt_eval_duration=25000000 eval_count=117 eval_duration=1448000000 message=Message(role='assistant', content='This image shows a family having a meal together. There are several plates of food on the table, including what appears to be rice, salad, and meat or seafood dishes. The people are seated around the table, with some holding chopsticks and others pouring tea from a teapot.\n\nThe atmosphere seems to be one of joy and togetherness, as the family is laughing and chatting while they eat. The image suggests that this is a special occasion, possibly a holiday or celebration, given the large number of people gathered around the table and the variety of dishes being served.', images=None, tool_calls=None)
(ollama_service) (base) ubuntu@ubuntu-server:~/code/ollama_service$ 

其中Content内容为This image shows a family having a meal together. There are several plates of food on the table, including what appears to be rice, salad, and meat or seafood dishes. The people are seated around the table, with some holding chopsticks and others pouring tea from a teapot.\n\nThe atmosphere seems to be one of joy and togetherness, as the family is laughing and chatting while they eat. The image suggests that this is a special occasion, possibly a holiday or celebration, given the large number of people gathered around the table and the variety of dishes being served.

百度的翻译结果为:

图为一家人在一起吃饭。\n\n*桌子上摆着几盘食物和茶杯。\n*桌子上有几盘食物,包括看似主菜的食物和一些蔬菜或沙拉。\n*桌子上还有几个茶杯,这表明一家人正在花时间一起吃饭。\n*图片中的人似乎玩得很开心。\他们都面带微笑,似乎很享受彼此的陪伴。\n*这表明他们珍视彼此的关系,并乐于共度时光。\n\n总的来说,这张照片传达了家庭成员之间的温暖和联系。

测试图片为

在这里插入图片描述

测试结论:

  • Llama3.2-vision在图片识别上,效果还是不错的,识别准确率较高。
  • LLama3.2-vision目前支持的语言中,没有中文,这一点不太友好。项目中,识别的结果需要进行翻译,才能使用。
  • LLama3.2-vision部署比较简单,很容易上手和验证。
  • LLama3.2-vision模型较小,只有7.9 GB,推理速度较快。
### 关于 llama3.2-vision:latest 图像识别问题的解决方案 对于 llama3.2-vision:latest 版本遇到的图像识别难题,可以考虑几个方面来排查并解决问题。 #### 1. 数据预处理验证 确保输入到模型的数据已经过适当预处理。这通常意味着调整大小、裁剪以及标准化等操作。如果这些步骤未被正确执行,则可能导致模型表现不佳[^2]。 ```python from PIL import Image import torchvision.transforms as transforms def preprocess_image(image_path): transform = transforms.Compose([ transforms.Resize((224, 224)), # 调整尺寸至标准输入大小 transforms.ToTensor(), # 将PIL图像转换成张量 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # 归一化处理 ]) img = Image.open(image_path).convert(&#39;RGB&#39;) tensor_img = transform(img) return tensor_img.unsqueeze_(0) # 增加批次维度 ``` #### 2. 模型配置检查 确认使用的具体版本和支持的功能集是否匹配预期需求。某些视觉能力可能是特定分支或更新后的特性,在较旧版本中不可用。 #### 3. 更新与替代选项评估 考虑到技术快速迭代的特点,建议查看是否有更近期发布的稳定版可供升级;同时探索其他开源多模态框架作为备选方案,比如 Hugging Face 的 transformers 库提供了多种基于 Transformer 架构构建而成的强大工具[^1]。 #### 4. 社区支持求助 当内部调试难以定位原因时,不妨向开发者社区寻求帮助。GitHub Issues 页面通常是获取官方指导和其他使用者经验分享的好地方。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值