流式输出
LLM 一次生成一个标记(token),因此许多 LLM 提供商提供了一种方式,可以逐个标记地流式传输响应,而不是等待整个文本生成完毕。 这显著改善了用户体验,因为用户不需要等待未知的时间,几乎可以立即开始阅读响应。
- 在springboot中集成流式输出,需要引入依赖
<!-- 流式输出 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
- langchain4j的Api中有对应的流式输出Api,只需要在 application.properties 配置文件中加上即可
langchain4j.community.dashscope.streaming-chat-model.api-key=${QWEN_API_KEY}
langchain4j.community.dashscope.streaming-chat-model.model-name=qwq-32b
- 在Controller中写一个方法测试
@RequestMapping(value = "/stream_qwen",produces = "text/stream;charset=UTF8")
public Flux<String> stream(@RequestParam(defaultValue="你是谁") String message){
Flux<String> flux = Flux.create(fluxSink -> {
qwenStreamingChatModel.chat(message, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String s) {
log.info(s);
fluxSink.next(s);
}
@Override
public void onCompleteResponse(ChatResponse chatResponse) {
log.info(chatResponse.toString());
fluxSink.complete();
}
@Override
public void onError(Throwable throwable) {
log.error(throwable.getMessage());
fluxSink.error(throwable);
}
});
});
return flux;
}
记忆对话
在上面的例子中,每次与大模型的对话都是没有记忆的,也就是说每次对话大模型并不会记住你上一次问了什么。举个例子来讲,第一次与大模型对话告诉它我的名字,第二次问他我叫什么大模型并不会知道。那么,如何让大模型显得有“记忆”呢?
手动实现的例子:
@Test
void testMemoryChat(){
OpenAiChatModel model = OpenAiChatModel.builder()
.baseUrl("http://langchain4j.dev/demo/openai/v1")
.apiKey("demo")
.modelName("gpt-4o-mini")
.build();
UserMessage userMessage1 = UserMessage.userMessage("你好,我是kizzo");
ChatResponse response1 = model.chat(userMessage1);
// 第一次响应
AiMessage aiMessage1 = response1.aiMessage();
System.out.println(aiMessage1.text());
System.out.println("---");
ChatResponse response2 = model.chat(userMessage1,aiMessage1,UserMessage.userMessage("你好,我是kizzo"));
// 第二次响应
AiMessage aiMessage2 = response2.aiMessage();
System.out.println(aiMessage2.text());
}
- 以上例子不难看出,手动维护和管理ChatMessage是很麻烦的。 因此,LangChain4j提供了ChatMemory抽象以及多种开箱即用的实现。ChatMemory可以作为独立的低级组件使用, 或者作为高级组件(如AI服务)的一部分。
- ChatMemory封装了聊天记录,并且可以设置最多存储多少聊天记录,默认是内存级别的存储,底层用到了jdk的动态代理。这里需要新建一个配置类写一个聊天助手对象,通过聊天助手来进行聊天。
package com.kizzo.langchain4j_spingboot_demo.config;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.TokenStream;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class AiConfig {
public interface Assistant{
String chat(String message);
// 流式响应
TokenStream stream(String message);
}
@Bean
public Assistant assistant(ChatLanguageModel chatLanguageModel, StreamingChatLanguageModel streamingChatLanguageModel){
// 最多存储多少聊天记录
ChatMemory chatMemory = MessageWindowChatMemory.withMaxMessages(10);
// 为Assistant动态代理对象chat ---> 对话内容存储ChatMemoryi ---> 聊天记录ChatMemory取出来 ---->放入到当前对话中
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(chatLanguageModel)
.streamingChatLanguageModel(streamingChatLanguageModel)
.chatMemory(chatMemory)
.build();
return assistant;
}
}
进入MessageWindowChatMemory类发现里面的store默认用的map存储聊天记录(记忆)
引入核心依赖
<!--核心-->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>${langchain4j.version}</version>
</dependency>
在Controller中添加方法,调用memoryChat方法后再调用memoryStreamChat方法,大模型会返回我的名字。这里本质上还是把第一次调用的结果传给了第二次调用。
@Autowired
AiConfig.Assistant assistant;
// 记忆普通对话
@RequestMapping(value = "/memory_chat")
public String memoryChat(@RequestParam(defaultValue="我是kizzo") String message){
return assistant.chat(message);
}
// 记忆流对话
@RequestMapping(value = "/memory_chat_stream",produces = "text/stream;charset=UTF8")
public Flux<String> memoryStreamChat(@RequestParam(defaultValue="我是谁") String message) {
TokenStream stream = assistant.stream(message);
return Flux.create(sink -> {
stream.onPartialResponse(s -> sink.next(s))
.onCompleteResponse(c -> sink.complete())
.onError(sink::error)
.start();
});
}
对话隔离
我们平常使用AI产品时,每次新建的对话和上一次都不会有联系。在上面的例子中并没有区分聊天的轮次,ChatMemory也集成了相应的Api,可以通过memoryId来区分。在配置类上新增Bean对象
@Bean
public AssistantUnique assistantUnique(ChatLanguageModel chatLanguageModel, StreamingChatLanguageModel streamingChatLanguageModel){
AssistantUnique assistant = AiServices.builder(AssistantUnique.class)
.chatLanguageModel(chatLanguageModel)
.streamingChatLanguageModel(streamingChatLanguageModel)
// chatMemory变为了chatMemoryProvider,让memoryId与聊天记录绑定并作为Map的key
.chatMemoryProvider(memoryId -> MessageWindowChatMemory.builder().maxMessages(10).id(memoryId).build() )
.build();
return assistant;
}
然后在Controller中加入自动装配的对象,以及新增调用的方法。
@Autowired
AiConfig.AssistantUnique assistantUnique;
// 记忆隔离对话
@RequestMapping(value = "/memoryId_chat")
public String memoryIdChat(@RequestParam(defaultValue="我是kizzo") String message,Integer userId){
return assistantUnique.chat(userId,message);
}
运行结果如图
对话记忆持久化
默认情况下,ChatMemory实现在内存中存储ChatMessage。
如果需要持久化,可以实现自定义的ChatMemoryStore, 将ChatMessage存储在持久化存储中,以mysql为例:
- 创建mysql数据库和表
CREATE TABLE chat_messages (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
memory_id VARCHAR(255) NOT NULL,
message_json TEXT NOT NULL,
gmt_create TIMESTAMP DEFAULT CURRENT_TIMESTAMP
); - 引入依赖
<properties>
<mysql-connector.version>8.0.33</mysql-connector.version>
<mybatis-spring-boot.version>3.0.1</mybatis-spring-boot.version>
</properties>
<dependencies>
<!--mybatis-->
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>${mybatis-spring-boot.version}</version>
</dependency>
<!--Mysql数据库驱动-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql-connector.version}</version>
</dependency>
</dependencies>
- 新增mapper以及xml文件
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.kizzo.langchain4j_spingboot_demo.mapper.ChatMessageMapper">
<select id="selectMessagesByMemoryId" resultType="string">
SELECT message_json FROM chat_messages WHERE memory_id = #{memoryId} ORDER BY created_at DESC LIMIT 10
</select>
<delete id="deleteMessagesByMemoryId">
DELETE FROM chat_messages WHERE memory_id = #{memoryId}
</delete>
<insert id="insertMessages">
INSERT INTO chat_messages (memory_id, message_json, )
VALUES (#{memoryId}, #{messageJson})
</insert>
</mapper>
package com.kizzo.langchain4j_spingboot_demo.mapper;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import java.util.List;
@Mapper
public interface ChatMessageMapper {
List<String> selectMessagesByMemoryId(@Param("memoryId") String memoryId);
int deleteMessagesByMemoryId(@Param("memoryId") String memoryId);
int insertMessages(@Param("memoryId") String memoryId, @Param("messageJson") String messageJson);
}
- config包下新增mybatis配置和对话记忆持久化配置类
package com.kizzo.langchain4j_spingboot_demo.config;
import org.mybatis.spring.annotation.MapperScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.annotation.EnableTransactionManagement;
/**
* MyBatis配置类
*/
@Configuration
@MapperScan({"com.kizzo.langchain4j_spingboot_demo.mapper"})
@EnableTransactionManagement
public class MyBatisConfig {
}
package com.kizzo.langchain4j_spingboot_demo.config;
import com.kizzo.langchain4j_spingboot_demo.mapper.ChatMessageMapper;
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.ChatMessageDeserializer;
import dev.langchain4j.data.message.ChatMessageSerializer;
import dev.langchain4j.store.memory.chat.ChatMemoryStore;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.stream.Collectors;
@Component
public class PersistentChatMemoryStore implements ChatMemoryStore {
private final ChatMessageMapper chatMessageMapper;
public PersistentChatMemoryStore(ChatMessageMapper chatMessageMapper) {
this.chatMessageMapper = chatMessageMapper;
}
@Override
public List<ChatMessage> getMessages(Object memoryId) {
String memoryIdStr = memoryId.toString();
List<String> jsonMessages = chatMessageMapper.selectMessagesByMemoryId(memoryIdStr);
return jsonMessages.stream()
.map(ChatMessageDeserializer::messagesFromJson)
.flatMap(List::stream)
.collect(Collectors.toList());
}
@Override
public void updateMessages(Object memoryId, List<ChatMessage> messages) {
String memoryIdStr = memoryId.toString();
String json = ChatMessageSerializer.messagesToJson(messages);
chatMessageMapper.insertMessages(memoryIdStr, json);
}
@Override
public void deleteMessages(Object memoryId) {
chatMessageMapper.deleteMessagesByMemoryId(memoryId.toString());
}
}
- 在配置类上新增Bean对象
@Bean
public AssistantUnique assistantUniqueStore(ChatLanguageModel chatLanguageModel,
StreamingChatLanguageModel streamingChatLanguageModel,
PersistentChatMemoryStore store){
ChatMemoryProvider chatMemoryProvider = memoryId -> MessageWindowChatMemory
.builder()
// 这个设置只会影响内存中的 MessageWindowChatMemory 实例,并不会自动限制写入数据库的数据量
.maxMessages(10)
.chatMemoryStore(store)
.id(memoryId)
.build();
AssistantUnique assistant = AiServices.builder(AssistantUnique.class)
.chatLanguageModel(chatLanguageModel)
.streamingChatLanguageModel(streamingChatLanguageModel)
// chatMemory变为了chatMemoryProvider,让memoryId与聊天记录绑定并作为Map的key
.chatMemoryProvider(chatMemoryProvider)
.build();
return assistant;
}
- 注入自动配置类,并新增用数据库持久化的接口
@Autowired
AiConfig.AssistantUnique assistantUniqueStore;
/**
* 带 memoryId 的记忆对话接口(使用数据库持久化)
*/
@RequestMapping("/memory_id_chat_store")
public String memoryIdChatWithStore(@RequestParam("message") String message,
@RequestParam("userId") Integer userId) {
return assistantUniqueStore.chat(userId, message);
}
/**
* 带 memoryId 的流式记忆对话接口(使用数据库持久化)
*/
@RequestMapping(value = "/memory_id_chat_store_stream", produces = "text/stream;charset=UTF-8")
public Flux<String> memoryIdChatWithStoreStream(@RequestParam("message") String message,
@RequestParam("userId") Integer userId) {
TokenStream stream = assistantUniqueStore.stream(userId, message);
return Flux.create(sink -> {
stream.onPartialResponse(s -> sink.next(s))
.onCompleteResponse(c -> sink.complete())
.onError(sink::error)
.start();
});
}