Vision Agent 开源项目教程

最新推荐文章于 2025-04-09 12:24:37 发布

卢红梓

最新推荐文章于 2025-04-09 12:24:37 发布

阅读量732

点赞数 5

本文链接：https://blog.csdn.net/gitblog_00837/article/details/142165404

版权

Vision Agent 开源项目教程

vision-agent Vision agent 项目地址: https://gitcode.com/gh_mirrors/vi/vision-agent

1. 项目介绍

Vision Agent 是一个开源库，旨在帮助用户利用代理框架生成代码来解决视觉任务。许多当前的视觉问题可能需要数小时甚至数天才能解决，用户需要找到合适的模型，弄清楚如何使用它并编程以完成任务。Vision Agent 通过允许用户用文本描述他们的问题，并让代理框架生成解决任务的代码，提供几秒钟内的体验。

2. 项目快速启动

安装

要开始使用 Vision Agent，您可以使用 pip 安装该库：

pip install vision-agent

确保您有一个 OpenAI API 密钥并将其设置为环境变量：

export OPENAI_API_KEY="your-api-key"

基本使用

您可以像与任何 LLM 或 LMM 模型交互一样与代理交互：

from vision_agent.agent import VisionAgent

agent = VisionAgent()
code = agent("What percentage of the area of the jar is filled with coffee beans", media="jar.jpg")
print(code)

3. 应用案例和最佳实践

案例1：检测图像中的花

假设您有一张包含花的图像，您可以使用 Vision Agent 来检测图像中的花，绘制方框并输出图像，同时返回花的总数。

from vision_agent.agent import VisionAgent

agent = VisionAgent()
result = agent("Detect the flowers in this image, draw boxes and output the image, and return the total number of flowers", media="flowers.jpg")
print(result)

案例2：计算罐子中咖啡豆的填充百分比

您可以使用 Vision Agent 来计算罐子中咖啡豆的填充百分比。

from vision_agent.agent import VisionAgent

agent = VisionAgent()
code = agent("What percentage of the area of the jar is filled with coffee beans", media="jar.jpg")
print(code)