使用Anthropic模型进行图像推理的多模态LLM

最新推荐文章于 2024-09-15 22:31:42 发布

llzwxh888

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量285

点赞数 3

文章标签： python 开发语言

本文链接：https://blog.csdn.net/ppoojjj/article/details/140284389

版权

在本文中，我们将介绍如何使用Anthropic多模态LLM进行图像理解和推理。Anthropic最近发布了其最新的多模态模型：Claude 3 Opus和Claude 3 Sonnet。我们将展示如何使用这些模型进行图像推理操作，并提供一些相关的代码示例。

安装依赖

在开始之前，我们需要安装一些必要的Python库：

!pip install llama-index-multi-modal-llms-anthropic
!pip install llama-index-vector-stores-qdrant
!pip install matplotlib

使用本地图像进行推理

首先，我们将展示如何使用Anthropic的API来理解本地目录中的图像。

import os
from PIL import Image
import matplotlib.pyplot as plt
from llama_index.core import SimpleDirectoryReader
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal

# 设置API密钥
os.environ["ANTHROPIC_API_KEY"] = ""  # 在此处填入你的ANTHROPIC API密钥

# 读取本地图像
img = Image.open("../data/images/prometheus_paper_card.png")
plt.imshow(img)

# 加载图像数据
image_documents = SimpleDirectoryReader(
    input_files=["../data/images/prometheus_paper_card.png"]
).load_data()

# 初始化Anthropic多模态类
anthropic_mm_llm = AnthropicMultiModal(max_tokens=300)

# 推理图像
response = anthropic_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_documents,
)

print(response)

使用URL进行图像推理

接下来，我们将展示如何使用AnthropicMultiModal类来从URL加载并推理图像。

from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

image_urls = [
    "https://venturebeat.com/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-12.49.41%E2%80%AFAM.png",
    # 添加你自己的URL
]

img_response = requests.get(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)

image_url_documents = load_image_urls(image_urls)

response = anthropic_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_url_documents,
)

print(response)

从图像生成结构化输出

我们还可以使用多模态Pydantic程序从图像生成结构化输出。

from llama_index.core import SimpleDirectoryReader
from PIL import Image
import matplotlib.pyplot as plt
from pydantic import BaseModel
from typing import List

class TickerInfo(BaseModel):
    direction: str
    ticker: str
    company: str
    shares_traded: int
    percent_of_total_etf: float

class TickerList(BaseModel):
    fund: str
    tickers: List[TickerInfo]

image_documents = SimpleDirectoryReader(
    input_files=["../data/images/ark_email_sample.PNG"]
).load_data()

img = Image.open("../data/images/ark_email_sample.PNG")
plt.imshow(img)

from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser

prompt_template_str = """
Can you get the stock information in the image \
and return the answer? Pick just one fund. 

Make sure the answer is a JSON format corresponding to a Pydantic schema. The Pydantic schema is given below.
"""

anthropic_mm_llm = AnthropicMultiModal(max_tokens=300)

llm_program = MultiModalLLMCompletionProgram.from_defaults(
    output_cls=TickerList,
    image_documents=image_documents,
    prompt_template_str=prompt_template_str,
    multi_modal_llm=anthropic_mm_llm,
    verbose=True,
)

response = llm_program()

print(str(response))