使用DashScope进行多模态图像理解

最新推荐文章于 2024-08-29 09:00:38 发布

llzwxh888

最新推荐文章于 2024-08-29 09:00:38 发布

阅读量343

点赞数 4

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140706826

版权

使用DashScope进行多模态图像理解

在这篇文章中，我们将展示如何使用DashScope qwen-vl多模态LLM类进行图像理解和推理。请注意，目前还不支持异步操作。我们还将展示DashScope LLM支持的几种函数：

complete (sync): 对于单个提示和图像列表进行处理。
chat (sync): 对于多个聊天消息进行处理。
stream complete (sync): 对于完成过程的流式输出。
stream chat (sync): 对于聊天消息的流式输出。
多轮对话。

安装指导

!pip install -U llama-index-multi-modal-llms-dashscope

使用API密钥设置环境变量

# 设置API密钥
%env DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

初始化DashScope并加载图像URL

from llama_index.multi_modal_llms.dashscope import (
    DashScopeMultiModal,
    DashScopeMultiModalModels,
)

from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
]

image_documents = load_image_urls(image_urls)

dashscope_multi_modal_llm = DashScopeMultiModal(
    model_name=DashScopeMultiModalModels.QWEN_VL_MAX,
)

使用图像完成提示

complete_response = dashscope_multi_modal_llm.complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)
print(complete_response)

//中转API

输出:

The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature.

多图像提示

multi_image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/panda.jpeg",
]

multi_image_documents = load_image_urls(multi_image_urls)
complete_response = dashscope_multi_modal_llm.complete(
    prompt="What animals are in the pictures?",
    image_documents=multi_image_documents,
)
print(complete_response)

//中转API

输出:

There is a dog in Picture 1, and there is a panda in Picture 2.

流式输出完成过程

stream_complete_response = dashscope_multi_modal_llm.stream_complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)

for r in stream_complete_response:
    print(r.delta, end="")

//中转API

输出:

The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature.

多轮对话

from llama_index.core.base.llms.types import MessageRole
from llama_index.multi_modal_llms.dashscope.utils import (
    create_dashscope_multi_modal_chat_message,
)

chat_message_user_1 = create_dashscope_multi_modal_chat_message(
    "What's in the image?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_user_1])
print(chat_response.message.content[0]["text"])
chat_message_assistent_1 = create_dashscope_multi_modal_chat_message(
    chat_response.message.content[0]["text"], MessageRole.ASSISTANT, None
)
chat_message_user_2 = create_dashscope_multi_modal_chat_message(
    "what are they doing?", MessageRole.USER, None
)
chat_response = dashscope_multi_modal_llm.chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
print(chat_response.message.content[0]["text"])

//中转API

输出:

The image shows two photos of a panda sitting on a wooden log in an enclosure. In the top photo, the panda is sitting upright with its front paws on the log, facing three crows that are perched on the log. The panda looks alert and curious, while the crows seem to be observing the panda. In the bottom photo, the panda is lying down on the log, its head resting on its front paws. One crow has landed on the ground next to the log, and it seems to be interacting with the panda. The background of the photo shows green plants and a wire fence, creating a natural and relaxed atmosphere.

The woman is sitting on the beach with her dog, and they are giving each other high fives. The panda and the crows are sitting together on a log, and the panda seems to be communicating with the crows.

流式聊天消息

stream_chat_response = dashscope_multi_modal_llm.stream_chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
for r in stream_chat_response:
    print(r.delta, end="")

//中转API

输出:

The woman is sitting on the beach, holding a treat in her hand, while the dog is sitting next to her, taking the treat from her hand.

使用本地文件图片

from llama_index.multi_modal_llms.dashscope.utils import load_local_images

local_images = [
    "file://THE_FILE_PATH1",
    "file://THE_FILE_PATH2",
]

image_documents = load_local_images(local_images)
chat_message_local = create_dashscope_multi_modal_chat_message(
    "What animals are in the pictures?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_local])
print(chat_response.message.content[0]["text"])

//中转API

输出:

There is a dog in Picture 1, and there is a panda in Picture 2.

常见错误及解决方法

API Key错误:
- 错误信息: Invalid API Key
- 解决方法: 确认API Key正确设置，确保无拼写错误。
网络连接错误:
- 错误信息: Network Connection Error
- 解决方法: 确保网络连接正常，再次尝试调用API。
图像加载失败:
- 错误信息: Failed to load images from URL
- 解决方法: 检查图像URL是否有效，如果使用本地文件，确保文件路径正确并且文件存在。