使用llama3.2-vision进行图片信息识别

最新推荐文章于 2025-02-28 10:38:05 发布

冀辉

最新推荐文章于 2025-02-28 10:38:05 发布

阅读量1.1k

点赞数 6

分类专栏： LLM 文章标签： AI编程计算机视觉

本文链接：https://blog.csdn.net/jihui8848/article/details/144258748

版权

LLM 专栏收录该内容

15 篇文章

订阅专栏

使用llama3.2-vision进行图片信息识别

环境准备
模型测试
测试结论：

环境准备

创建Python虚拟环境

conda create -n ollama python=3.10

安装依赖包

pip install ollama

说明：

安装时，需要清除设置的国内源。
Windows下的国内源路径C:\Users\{user_name}\AppData\Roaming\pip
Ubuntu下的pip国内源路径/home/ubuntu/.config/pip
需要配置环境，可以访问Google。

安装过程演示如下：

(ollama_service) ubuntu@mySvr:~$ pip install ollama
Collecting ollama
  Downloading ollama-0.4.2-py3-none-any.whl.metadata (4.8 kB)
Collecting httpx<0.28.0,>=0.27.0 (from ollama)
  Using cached httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting pydantic<3.0.0,>=2.9.0 (from ollama)
  Downloading pydantic-2.10.3-py3-none-any.whl.metadata (172 kB)
Collecting anyio (from httpx<0.28.0,>=0.27.0->ollama)
  Downloading anyio-4.6.2.post1-py3-none-any.whl.metadata (4.7 kB)
Collecting certifi (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Collecting idna (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting sniffio (from httpx<0.28.0,>=0.27.0->ollama)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama)
  Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3.0.0,>=2.9.0->ollama)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.27.1 (from pydantic<3.0.0,>=2.9.0->ollama)
  Downloading pydantic_core-2.27.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting typing-extensions>=4.12.2 (from pydantic<3.0.0,>=2.9.0->ollama)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting exceptiongroup>=1.0.2 (from anyio->httpx<0.28.0,>=0.27.0->ollama)
  Using cached exceptiongroup-1.2.2-py3-none-any.whl.metadata (6.6 kB)
Downloading ollama-0.4.2-py3-none-any.whl (13 kB)
Using cached httpx-0.27.2-py3-none-any.whl (76 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)
Downloading pydantic_core-2.27.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 6.9 MB/s eta 0:00:00
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading anyio-4.6.2.post1-py3-none-any.whl (90 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached certifi-2024.8.30-py3-none-any.whl (167 kB)
Using cached exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: typing-extensions, sniffio, idna, h11, exceptiongroup, certifi, annotated-types, pydantic-core, httpcore, anyio, pydantic, httpx, ollama
Successfully installed annotated-types-0.7.0 anyio-4.6.2.post1 certifi-2024.8.30 exceptiongroup-1.2.2 h11-0.14.0 httpcore-1.0.7 httpx-0.27.2 idna-3.10 ollama-0.4.2 pydantic-2.10.3 pydantic-core-2.27.1 sniffio-1.3.1 typing-extensions-4.12.2
(ollama_service) ubuntu@mySvr:~$

安装后的依赖包信息如下：

(ollama_service) ubuntu@mySvr:~$ pip freeze
annotated-types==0.7.0
anyio==4.6.2.post1
certifi==2024.8.30
exceptiongroup==1.2.2
h11==0.14.0
httpcore==1.0.7
httpx==0.27.2
idna==3.10
ollama==0.4.2
pydantic==2.10.3
pydantic_core==2.27.1
sniffio==1.3.1
typing_extensions==4.12.2
(ollama_service) ubuntu@mySvr:~$

模型下载和运行

ollama run llama3.2-vision

演示过程如下：

(base) ubuntu@mySvr:~$ ollama run llama3.2-vision
pulling manifest 
pulling 11f274007f09... 100% ▕██████████████████████████████████████████████████████▏ 6.0 GB                         
pulling ece5e659647a... 100% ▕██████████████████████████████████████████████████████▏ 1.9 GB                         
pulling 715415638c9c... 100% ▕██████████████████████████████████████████████████████▏  269 B                         
pulling 0b4284c1f870... 100% ▕██████████████████████████████████████████████████████▏ 7.7 KB                         
pulling fefc914e46e6... 100% ▕██████████████████████████████████████████████████████▏   32 B                         
pulling fbd313562bb7... 100% ▕██████████████████████████████████████████████████████▏  572 B                         
verifying sha256 digest 
writing manifest 
success 
>>>

测试模型

>>> How many state in USA?
There are 50 states in the United States of America.

>>> 美国有多少个州？
美国有 50 个州。

>>> 中国的首都是哪里？
中国的首都是北京（Beijing）。

>>> 帮我写一个50字的故事，关于猴子的
有一只名叫毛毛的猴子，住在树顶上的家里。毛毛喜欢跳舞，经常会随着音乐跳起来。有一天，风暴来临，树枝开始摇晃，毛毛害
怕。然而，他想起了自己爱好的舞蹈，于是开始跳舞，舞动的身体使他稳定在树上，风暴也最终过去了。

>>> Send a message (/? for help)

模型测试

程序代码1

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['/home/ubuntu/1.png']
    }]
)

print(response)

测试结果1

运行程序，返回结果如下：

(ollama_service) (base) ubuntu@mySvr:~/code/ollama_service$ /home/ubuntu/miniconda3/envs/ollama_service/bin/python /home/ubuntu/code/ollama_service/main.py
model='llama3.2-vision' created_at='2024-12-05T01:38:43.918812229Z' done=True done_reason='stop' total_duration=2291537793 load_duration=23306166 prompt_eval_count=18 prompt_eval_duration=1059000000 eval_count=80 eval_duration=988000000 message=Message(role='assistant', content='The image depicts a golden dragon figurine, featuring intricate scales and detailing. The dragon appears to be made of gold or metal and has its mouth open as if roaring. It is sitting on an ornate base that seems to be made of the same material. The overall effect is one of grandeur and majesty, suggesting that this might be a decorative piece for a wealthy collector or enthusiast.', images=None, tool_calls=None)
(ollama_service) (base) ubuntu@mySvr:~/code/ollama_service$

其中Content内容为The image depicts a golden dragon figurine, featuring intricate scales and detailing. The dragon appears to be made of gold or metal and has its mouth open as if roaring. It is sitting on an ornate base that seems to be made of the same material. The overall effect is one of grandeur and majesty, suggesting that this might be a decorative piece for a wealthy collector or enthusiast.

使用百度翻译后的结果为：

这幅图像描绘了一个金龙雕像，具有复杂的鳞片和细节。龙似乎是由黄金或金属制成的，它张着嘴，好像在咆哮。它坐在一个华丽的底座上，底座似乎也是由同样的材料制成的。整体效果是宏伟和威严，这表明这可能是一件富有的收藏家或爱好者的装饰品。

测试图片为

在这里插入图片描述

程序代码2

from ollama import Client

client = Client(
  host='http://192.168.0.160:12345',
  headers={'x-some-header': 'some-value'}
)

response = client.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['/home/ubuntu/test.png']
    }]
)

print(response)

测试结果2

(ollama_service) (base) ubuntu@ubuntu-server:~/code/ollama_service$ /home/ubuntu/miniconda3/envs/ollama_service/bin/python /home/ubuntu/code/ollama_service/demo_rest.py
model='llama3.2-vision' created_at='2024-12-05T02:31:55.446849548Z' done=True done_reason='stop' total_duration=1700043974 load_duration=14954191 prompt_eval_count=18 prompt_eval_duration=25000000 eval_count=117 eval_duration=1448000000 message=Message(role='assistant', content='This image shows a family having a meal together. There are several plates of food on the table, including what appears to be rice, salad, and meat or seafood dishes. The people are seated around the table, with some holding chopsticks and others pouring tea from a teapot.\n\nThe atmosphere seems to be one of joy and togetherness, as the family is laughing and chatting while they eat. The image suggests that this is a special occasion, possibly a holiday or celebration, given the large number of people gathered around the table and the variety of dishes being served.', images=None, tool_calls=None)
(ollama_service) (base) ubuntu@ubuntu-server:~/code/ollama_service$

其中Content内容为This image shows a family having a meal together. There are several plates of food on the table, including what appears to be rice, salad, and meat or seafood dishes. The people are seated around the table, with some holding chopsticks and others pouring tea from a teapot.\n\nThe atmosphere seems to be one of joy and togetherness, as the family is laughing and chatting while they eat. The image suggests that this is a special occasion, possibly a holiday or celebration, given the large number of people gathered around the table and the variety of dishes being served.

百度的翻译结果为：

图为一家人在一起吃饭。\n\n*桌子上摆着几盘食物和茶杯。\n*桌子上有几盘食物，包括看似主菜的食物和一些蔬菜或沙拉。\n*桌子上还有几个茶杯，这表明一家人正在花时间一起吃饭。\n*图片中的人似乎玩得很开心。\他们都面带微笑，似乎很享受彼此的陪伴。\n*这表明他们珍视彼此的关系，并乐于共度时光。\n\n总的来说，这张照片传达了家庭成员之间的温暖和联系。

测试图片为

在这里插入图片描述