使用Dify为DeepSeek-R1添加多模态功能

最新推荐文章于 2025-03-16 17:36:35 发布

程序猿李巡天

最新推荐文章于 2025-03-16 17:36:35 发布

阅读量3.5k

点赞数 24

文章标签：人工智能开源知识图谱 microsoft 机器人

本文链接：https://blog.csdn.net/m0_59235945/article/details/145571570

版权

在DeepSeek-R1引发全球AI领域关注之际，其突破性的推理能力已通过多项测试得到验证：模型不仅将AIME数学竞赛准确率从15.6%提升至86.7%，更在Codeforces编程竞赛中超越96.3%人类参与者，展现出真实的数学直觉与迁移学习能力。然而作为纯文本模型，其官方版本存在多模态能力缺失与功能互斥的局限。

我选择通过Dify构建智能编排层：以DeepSeek-R1作为推理引擎，驱动更强大模型的多模态能力，实现文件解析与网络连接的协同运作。

在 Dify 中创建一个空白应用，选择 Chatflow 类型，打开工作区点击右上角的“功能”按钮，打开“文件上传”功能，勾选“文档”和“图片”类型。

按照上图编排工作流，核心思路就是解析文档和图片内容，交给 DeepSeek-R1 只生成推理内容，再把文档或图片以及解析到的内容和 R1 推理全部传给 Gemini 多模态模型，最终由 Gemini 来回答用户问题。

DeepSeek-R1 思考节点

DeepSeek-R1 扮演“优等生”的角色，专注于问题分解和逻辑推理。其核心任务是输出完整的思考过程，而不是直接提供答案。

在编写系统提示时，建议编写结构化提示，例如使用 XML 格式，这可以增强模型对问题任务的分解。

提示词如下：

<Role>``You are an LLM with reasoning capabilities.``Unlike other LLMs, you can output your complete thinking process.``</Role>``<Task>``Your task is to assist other LLMs that lack reasoning capabilities.``You need to output complete thinking processes for other LLMs based on user questions.``<Steps>``"Step 1": "Receive questions from users."``"Step 2": "Conduct deep reasoning and analysis on user questions."``"Step 3": "Elaborate on the reasoning process and logic, ensuring the process is complete and easy to understand."``"Step 4": "Output the complete reasoning process, no final answer needed."``</Steps>``</Task>``<Limitations>``Do not output the final answer, only output the thinking process.``Do not explain your own capabilities or limitations.``</Limitations>``   ``In addition, we need to adjust the user input content, adding the content from the doc extractor:``<User Query>``{{Start}}``</User Query>``<File>``{{text}}``</File>

Gemini 多模态节点

Gemini 是一种具有强大视觉能力的多模态模型，依靠 R1 推理框架结合多模态数据并生成最终答案。其优势在于图像解析和结果优化。注意需要在此节点中启用LLM的视觉功能以获得解析图片和文档的能力。

提示词如下：

<Role>``You are an LLM that excels at learning.``</Role>``<Task>``You need to learn from others' thinking processes about problems, enhance your results with their thinking, and then provide your answer.``<Steps>``"Step 1": "Receive thinking process from DeepSeek-R1 model."``"Step 2": "Carefully study and understand DeepSeek-R1's reasoning logic and steps."``"Step 3": "Generate final answer based on DeepSeek-R1's thinking, combined with image capabilities."``"Step 4": "Output the final answer, no need to explain the thinking process."``</Steps>``</Task>``<Limitations>``Do not repeat DeepSeek-R1's thinking process, only output the final answer.``Do not explain your own capabilities or learning process.``Ensure the answer is accurate and relevant to the question.``</Limitations>

如何学习大模型 AI ？

由于新岗位的生产效率，要优于被取代岗位的生产效率，所以实际上整个社会的生产效率是提升的。

但是具体到个人，只能说是：

“最先掌握AI的人，将会比较晚掌握AI的人有竞争优势”。

这句话，放在计算机、互联网、移动互联网的开局时期，都是一样的道理。

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

在这里插入图片描述