Spring AI如何调用本地部署的大模型

最新推荐文章于 2025-05-01 23:42:01 发布

fzip

最新推荐文章于 2025-05-01 23:42:01 发布

阅读量593

点赞数 12

分类专栏：大模型 Spring 文章标签： spring 人工智能 java

本文链接：https://blog.csdn.net/zpf_940810653842/article/details/147627905

版权

大模型同时被 2 个专栏收录

9 篇文章

订阅专栏

Spring

3 篇文章

订阅专栏

Spring AI 调用本地部署的大模型主要通过集成 Ollama（本地大模型运行框架）实现，结合 Spring Boot 的模块化设计提供企业级调用支持。以下是详细步骤和实现原理：

一、环境准备

1. 部署 Ollama 服务

从 Ollama 官网下载并安装本地服务。
启动 Ollama 服务后，通过命令行拉取模型（如 ollama pull llama3 或 ollama pull deepseek-r1:7b）。
验证服务可用性：访问 http://localhost:11434，确认接口响应正常。

2. Docker 容器化部署（可选）

通过 Docker 简化环境配置，例如：

docker run -d -p 11434:11434 --name ollama ollama/ollama:0.6.2
docker exec -it ollama ollama pull deepseek-r1:7b  # 拉取模型

此方式适合需要隔离环境或使用 GPU 加速的场景。

二、Spring Boot 项目配置

1. 添加依赖

在 pom.xml 中引入 Spring AI 的 Ollama 模块：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

同时需依赖 Spring Boot 3.2.x 或更高版本。

2. 配置 Ollama 连接

在 application.yml 中指定模型服务地址和默认模型：

spring:
  ai:
    ollama:
      base-url: http://localhost:11434  # Ollama 服务地址
      chat:
        model: deepseek-r1:7b           # 默认调用的模型名称

此配置使 Spring AI 自动注入 OllamaChatClient。

三、代码实现

1. 定义 Controller

创建 REST 接口接收请求并转发至大模型：

@RestController
public class OllamaController {
    @Autowired
    private OllamaChatClient chatClient;

    // 普通文本调用
    @GetMapping("/ai/chat")
    public String chat(@RequestParam String message) {
        Prompt prompt = new Prompt(message);
        return chatClient.call(prompt).getResult().getOutput().getContent();
    }

    // 流式调用（适用于实时对话）
    @GetMapping("/ai/chat/stream")
    public Flux<String> streamChat(@RequestParam String message) {
        Prompt prompt = new Prompt(message);
        return chatClient.stream(prompt)
                .map(response -> response.getResult().getOutput().getContent());
    }
}

流式调用通过 Flux 返回逐句生成的文本，适合需要实时响应的场景。

2. 自定义提示词与参数

通过 Prompt 对象支持复杂交互：

SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate("你是一个专业的助手");
Message systemMessage = systemPromptTemplate.createMessage(Map.of("prompt", "请用中文回答"));
Prompt prompt = new Prompt(List.of(new UserMessage(message), systemMessage));

可结合 SystemPromptTemplate 控制模型输出格式。

四、高级功能与优化

1. 连续对话支持

使用 Redis 缓存上下文信息，实现多轮对话：

@Autowired
private RedisTemplate<String, String> redisTemplate;

public String chatWithContext(String sessionId, String message) {
    String history = redisTemplate.opsForValue().get(sessionId);
    String fullMessage = history + "\nUser: " + message;
    // 调用模型并更新缓存
    redisTemplate.opsForValue().set(sessionId, fullMessage);
    return chatClient.call(new Prompt(fullMessage)).getResult().getOutput().getContent();
}