[如何利用Llama2Chat增强Llama-2聊天模型体验]-CSDN博客

本文链接：https://blog.csdn.net/tt_jishu/article/details/142892234

如何利用Llama2Chat增强Llama-2聊天模型体验

引言

在大语言模型（LLM）领域，Llama-2是一个备受关注的开源模型。通过使用Llama2Chat这个包装器，我们可以有效地增强Llama-2模型的聊天能力。本文旨在讲解如何使用Llama2Chat与不同的接口（如HuggingFaceTextGenInference和LlamaCpp等）结合，实现Llama-2的聊天功能。

主要内容

Llama2Chat包装器的工作原理

Llama2Chat是一个通用的包装器，实现了BaseChatModel接口。它可以将消息列表转换成所需的聊天提示格式，并将格式化后的提示转发给封装的LLM。

使用HuggingFaceTextGenInference进行模型推理

HuggingFaceTextGenInference封装了对文本生成推理服务器的访问。以下是启动推理服务器的Docker命令示例：

docker run \
  --rm \
  --gpus all \
  --ipc=host \
  -p 8080:80 \
  -v ~/.cache/huggingface/hub:/data \
  -e HF_API_TOKEN=${HF_API_TOKEN} \
  ghcr.io/huggingface/text-generation-inference:0.9 \
  --hostname 0.0.0.0 \
  --model-id meta-llama/Llama-2-13b-chat-hf \
  --quantize bitsandbytes \
  --num-shard 4

利用LlamaCpp接口

使用LlamaCpp可以访问本地存储的Llama模型。以下是创建LlamaCpp实例的示例代码：

from os.path import expanduser
from langchain_community.llms import LlamaCpp

model_path = expanduser("~/Models/llama-2-7b-chat.Q4_0.gguf")
llm = LlamaCpp(model_path=model_path, streaming=False)
model = Llama2Chat(llm=llm)

代码示例

以下是如何使用Llama2Chat与LLMChain结合的完整代码示例：

from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_experimental.chat_models import Llama2Chat
from langchain_core.messages import SystemMessage
from langchain_core.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder

# 设置聊天提示模板
template_messages = [
    SystemMessage(content="You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    HumanMessagePromptTemplate.from_template("{text}"),
]
prompt_template = ChatPromptTemplate.from_messages(template_messages)

# 使用HuggingFaceTextGenInference
from langchain_community.llms import HuggingFaceTextGenInference
llm = HuggingFaceTextGenInference(
    inference_server_url="http://127.0.0.1:8080/",
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
)
model = Llama2Chat(llm=llm)

# 创建LLMChain
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chain = LLMChain(llm=model, prompt=prompt_template, memory=memory)

# 运行聊天
print(chain.run(text="What can I see in Vienna? Propose a few locations. Names only, no details."))