[用ChatLlamaCpp实现强大的聊天模型：从配置到高级应用]

tt_jishu

于 2024-10-04 21:18:12 发布

阅读量585

点赞数 13

文章标签： python

本文链接：https://blog.csdn.net/tt_jishu/article/details/142707479

版权

用ChatLlamaCpp实现强大的聊天模型：从配置到高级应用

引言

在AI驱动的应用程序中，聊天模型的应用越来越广泛。本文将深入介绍如何使用ChatLlamaCpp与Llama-CPP-Python库集成，构建一个强大的聊天模型。我们将涵盖初始配置、模型实例化、调用工具及函数、结构化输出和流式传输等多个方面。

主要内容

1. 概述

ChatLlamaCpp是LangChain社区提供的一种聊天模型集成。它支持调用工具、结构化输出、令牌级流式传输等功能。下面是该模型的一些特点：

工具调用：支持
结构化输出：支持
JSON模式：不支持
图像输入：不支持
音频输入：不支持
视频输入：不支持
令牌级流式传输：支持
本地异步：不支持
令牌使用情况：支持
记录概率：支持

2. 设置

为了使用这些特性，建议使用经过工具调用微调的模型。我们将使用NousResearch的Hermes-2-Pro-Llama-3-8B-GGUF模型。

参考以下指南以深入了解本地模型的操作方式：

3. 安装

要安装LangChain LlamaCpp集成，运行以下命令：

%pip install -qU langchain-community llama-cpp-python

4. 实例化模型

下面的代码展示了如何实例化模型对象并生成聊天内容：

# 路径到你的模型权重
local_model = "local/path/to/Hermes-2-Pro-Llama-3-8B-Q8_0.gguf"

import multiprocessing
from langchain_community.chat_models import ChatLlamaCpp

llm = ChatLlamaCpp(
    temperature=0.5,
    model_path=local_model,
    n_ctx=10000,
    n_gpu_layers=8,
    n_batch=300,  # 设置在1到n_ctx之间，考虑到你的GPU的显存
    max_tokens=512,
    n_threads=multiprocessing.cpu_count() - 1,
    repeat_penalty=1.5,
    top_p=0.5,
    verbose=True,
)

5. 调用模型

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]

ai_msg = llm.invoke(messages)
print(ai_msg.content)
# 输出: 
# J'aime programmer. (In France, "programming" is often used in its original sense of scheduling or organizing events.) 
# If you meant computer-programming: Je suis amoureux de la programmation informatique.