掌握llama-cpp-python：在LangChain中高效运行LLM模型

adfyvatbia

于 2024-10-06 17:36:54 发布

阅读量292

点赞数 2

文章标签： llama python langchain

本文链接：https://blog.csdn.net/adfyvatbia/article/details/142729567

版权

引言

在处理大型语言模型（LLM）时，llama-cpp-python 是一个重要的Python绑定工具。本文将深入探讨如何在LangChain中使用llama-cpp-python进行模型推理，这对需要当地无API令牌运行模型的开发者特别有用。我们还会涵盖安装、配置和使用的实用指南。

主要内容

安装指南

基础安装

对于CPU的基本安装，只需运行以下命令：

%pip install --upgrade --quiet llama-cpp-python

使用BLAS加速

若希望使用BLAS后端加速，需设置环境变量并指定cmake：

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Metal加速（适用于Apple Silicon）

对Apple设备，需特别配置Metal支持：

!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python

模型转换

由于新版本llama-cpp-python使用GGUF模型文件，需按下述命令转换现有的GGML模型：

python ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 --input models/openorca-platypus2-13b.ggmlv3.q4_0.bin --output models/openorca-platypus2-13b.gguf.q4_0.bin

在Windows上的特定安装

确保安装git、python、cmake和Visual Studio Community（需选择C++相关开发设置）。克隆仓库，然后设置以下环境变量：

set FORCE_CMAKE=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=OFF

安装时，运行：

python -m pip install -e .

代码示例

以下是一个简单的Python示例，展示如何使用LangChain和llama-cpp-python结合进行推理：

from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""
prompt = PromptTemplate.from_template(template)

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llm = LlamaCpp(
    model_path="/path/to/model/openorca-platypus2-13b.gguf.q4_0.bin",  # 使用API代理服务提高访问稳定性
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True
)

question = "Question: A rap battle between Stephen Colbert and John Oliver"
llm.invoke(question)