使用 Dashscope 和通义千问进行多模态对话和图像识别

最新推荐文章于 2024-07-12 16:16:27 发布

m0_63764739

最新推荐文章于 2024-07-12 16:16:27 发布

阅读量994

点赞数 23

文章标签： python

本文链接：https://blog.csdn.net/m0_63764739/article/details/139352736

版权

在本博客中，我们将探讨如何使用 Dashscope 和通义千问 API 来实现多模态对话和图像识别。我们会详细介绍几个代码示例，展示如何通过 API 调用进行图片转文字和生成交互式对话。

多模态对话示例

首先，我们来看一个简单的多模态对话示例，该示例允许用户上传一张图片并提出一个问题，模型会根据图片和问题生成回答。

```python
from http import HTTPStatus
import dashscope
from dashscope.api_entities.dashscope_response import Role

# 设置API密钥
dashscope.api_key = "your_api_key_here"

def simple_multimodal_conversation_call(img, question):
messages = [
{
"role": "user",
"content": [
{"image": f"{img}"},
{"text": f"{question}"}
]
}
]
response = dashscope.MultiModalConversation.call(model='qwen-vl-plus', messages=messages)
if response.status_code == HTTPStatus.OK:
print(response.output.choices[0]['message']['content'][0]['text'])
else:
print(response.code)
print(response.message)

# 示例图片和问题
img = 'https://example.com/image.jpg'
text= '这张图片里有什么？'

simple_multimodal_conversation_call(img, text)
```

在这个示例中，`simple_multimodal_conversation_call` 函数接收图片 URL 和用户问题，调用 Dashscope 的 `qwen-vl-plus` 模型，处理响应并输出结果。

图片转文字示例

接下来，我们来看一个图片转文字的示例。该示例展示了如何使用 Dashscope 的 `qwen_turbo` 模型，将图片描述为文字。

```python
from http import HTTPStatus
import dashscope

# 设置API密钥
dashscope.api_key = "your_api_key_here"

def call_with_messages():
messages = [{'role': 'system', 'content': 'You are a helpful poster designer.'},
{'role': 'user', 'content': 'Describe this image: '},
{'role': 'user', 'content': '');'}]

response = dashscope.Generation.call(
dashscope.Generation.Models.qwen_turbo,
messages=messages,
result_format='message', # set the result to be "message" format.
)
if response.status_code == HTTPStatus.OK:
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))

if __name__ == '__main__':
call_with_messages()
```

在这个示例中，`call_with_messages` 函数定义了一系列消息，通过 `qwen_turbo` 模型生成描述。这对于需要将图片转化为文本描述的任务非常有用。

图像描述批量处理示例

接下来，我们展示如何使用批量处理进行图像描述。此示例展示了如何处理多个图像并生成相应的描述。

```python
from transformers import pipeline
from datasets import load_dataset

# 初始化图像描述pipeline
img_captioning = pipeline("image-captioning", model='damo/ofa_image-caption_coco_large_en', model_revision='v1.0.1')

# 处理单张图像
result = img_captioning('https://shuangqing-public.oss-cn-zhangjiakou.aliyuncs.com/donuts.jpg')
print(result[0]['caption']) # 输出图像描述

# 批量处理图像
images = [{'image': 'https://shuangqing-public.oss-cn-zhangjiakou.aliyuncs.com/donuts.jpg'} for _ in range(3)]
result = img_captioning(images, batch_size=2)
for r in result:
print(r['caption'])
```

这个示例展示了如何使用 `transformers` 库中的 `pipeline` 功能，批量处理图像并生成描述。

#### 逐字输出示例

最后，我们展示一个逐字输出的示例。该示例展示了如何生成逐字输出的文本，用于创建动态生成内容的效果。

```python
from http import HTTPStatus
import dashscope

# 设置API密钥
dashscope.api_key = "your_api_key_here"

def sample_sync_call_streaming():
# 设置需要生成的指令
prompt_text = '用萝卜、土豆、茄子做饭，给我个菜谱。'
# 调用dashscope.Generation.call方法生成响应流
response_generator = dashscope.Generation.call(
model='qwen-turbo',
prompt=prompt_text,
stream=True,
top_p=0.8
)

head_idx = 0
# 遍历响应流
for resp in response_generator:
# 获取每个响应中的文本段落
paragraph = resp.output['text']
# 打印文本段落中对应的文本
print("\r%s" % paragraph[head_idx:len(paragraph)], end='')
# 如果文本段落中存在换行符，则更新head_idx的值
if (paragraph.rfind('\n') != -1):
head_idx = paragraph.rfind('\n') + 1

# 调用sample_sync_call_streaming函数
sample_sync_call_streaming()
```

在这个示例中，`sample_sync_call_streaming` 函数生成一个逐字输出的文本流，可以用于实时显示生成内容。

### 结论

通过以上示例，我们展示了如何使用 Dashscope 和通义千问进行多模态对话、图片转文字、批量图像描述和逐字输出。通过这些示例，开发者可以快速上手并创建自己的多模态应用，满足各种机器学习和自然语言处理的需求。希望这些示例能够帮助您更好地理解和使用这些强大的工具。

m0_63764739

关注

23
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
使用 Dashscope 和通义千问进行多模态对话和图像识别

在这个示例中，`simple_multimodal_conversation_call` 函数接收图片 URL 和用户问题，调用 Dashscope 的 `qwen-vl-plus` 模型，处理响应并输出结果。在这个示例中，`call_with_messages` 函数定义了一系列消息，通过 `qwen_turbo` 模型生成描述。在这个示例中，`sample_sync_call_streaming` 函数生成一个逐字输出的文本流，可以用于实时显示生成内容。此示例展示了如何处理多个图像并生成相应的描述。
复制链接

扫一扫