【书生浦语实战】打卡：llamaindex+Internlm2 RAG实践-CSDN博客

本文链接：https://blog.csdn.net/weixin_45575017/article/details/141790876

llamaindex+Internlm2 RAG实践

什么是RAG
什么是LlamaIndex
环境&模型准备
下载embedding模型--Sentence Transformer
安装nltk
没有RAG的问答效果
有RAG的问答效果
附1:没有RAG的问答调用示例
附2:有RAG的问答调用示例

什么是RAG

RAG是检索增强生成（Retrieval Augmented Generation，RAG）技术。给模型注入新知识，相当于你阅读函数文档然后短暂的记住了某个函数的用法。它能够让基础模型实现非参数知识更新，无需训练就可以掌握新领域的知识。

什么是LlamaIndex

LlamaIndex 是一个上下文增强的 LLM 框架，旨在通过将其与特定上下文数据集集成，增强大型语言模型（LLMs）的能力。它允许您构建应用程序，既利用 LLMs 的优势，又融入您的私有或领域特定信息。

环境&模型准备

环境依赖包：

pip install einops==0.7.0 protobuf==5.26.1
pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0 llama-index-embeddings-instructor==0.1.3

下载embedding模型–Sentence Transformer

import os

# 设置环境变量
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

# 下载模型
os.system('huggingface-cli download --resume-download sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --local-dir /root/model/sentence-transformer')

安装nltk

使用开源词向量模型构建开源词向量的时候，需要用到第三方库 nltk 的一些资源。正常情况下，其会自动从互联网上下载，但可能由于网络原因会导致下载中断，此处我们可以从国内仓库镜像地址下载相关资源，保存到服务器上。我们用以下命令下载 nltk 资源并解压到服务器上：

cd /root
git clone https://gitee.com/yzy0612/nltk_data.git  --branch gh-pages
cd nltk_data
mv packages/*  ./
cd tokenizers
unzip punkt.zip
cd ../taggers
unzip averaged_perceptron_tagger.zip

没有RAG的问答效果

直接调用1.8b模型，问“xtuner是什么”，模型会回答错误
在这里插入图片描述

有RAG的问答效果

把xtuner的仓库中所有文档向量化、然后检索向量并回答，结果正确（虽然不知道为啥回答了英文的，可能仓里既有中文又有英文的readme文件吧；后面又试了一次，明确要求用中文就得到了中文回复）
在这里插入图片描述

以下是"xtuner是什么?请用中文回答"的回答

📖 Introduction

XTuner 是一个高效、灵活和全功能的工具包，用于对大型模型进行精细调优。

**Efficient**

- 支持 LLM、VLM 预训练 / 微调，几乎所有 GPU 都能胜任。XTuner 能够对 7B LLM 进行单卡微调，同时也能支持 70B 以上的模型进行多卡微调。
- 自动将高性能操作（如 FlashAttention 和 Triton 内核）调度到训练中，以提高训练速度。
- 兼容 DeepSpeed 🚀，轻松利用 ZeRO 优化技术。

**Flexible**

- 支持各种 LLM（InternLM、Mixtral-8x7B、Llama 2、ChatGLM、Qwen、Baichuan 等）。
- 支持 VLM（LLaVA）。LLaVA-InternLM2-20B 的性能非常出色。
- 设计了完善的数据管道，支持各种数据格式，包括但不限于开源和自定义格式。
- 支持各种训练算法（QLoRA、LoRA、full-parameter fune-t

查看召回了哪些文件来回答：

print(response.source_nodes)

NodeWithScore(node=TextNode(id_='9e953ecb-3b6f-493a-83d6-cde00ed330f0', embedding=None, metadata={
'file_path': '/root/llamaindex_demo/data/xtuner/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 14885, 'creation_date': '2024-09-01', 'last_modified_date': '2024-09-01'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='da59d96e-3594-4248-a638-817498a2ccb6', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/root/llamaindex_demo/data/xtuner/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 14885, 'creation_date': '2024-09-01', 'last_modified_date': '2024-09-01'}, hash='0b3ef96fb00ee94a50b089ac95ae3c047286432a0abd5c5c79494c50a2b1d673')}, 
text='📖 Introduction\n\nXTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models.\n\n**Efficient**\n\n- Support LLM, VLM pre-training / fine-tuning on almost all GPUs. XTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.\n- Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.\n- Compatible with DeepSpeed 🚀, easily utilizing a variety of ZeRO optimization techniques.\n\n**Flexible**\n\n- Support various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...).\n- Support VLM (LLaVA). The performance of LLaVA-InternLM2-20B is outstanding.\n- Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.\n- Support various training algorithms (QLoRA, LoRA, full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.\n\n**Full-featured**\n\n- Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.\n- Support chatting with large models with pre-defined templates.\n- The output models can seamlessly integrate with deployment and server toolkit (LMDeploy), and large-scale evaluation toolkit (OpenCompass, VLMEvalKit).', mimetype='text/plain', start_char_idx=2, end_char_idx=1314, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.4452818699249386), 

NodeWithScore(node=TextNode(id_='dbd8f4a7-2a34-4b02-b819-7dbb6714ac11', embedding=None, metadata={
'file_path': '/root/llamaindex_demo/data/xtuner/README_zh-CN.md', 'file_name': 'README_zh-CN.md', 'file_type': 'text/markdown', 'file_size': 14846, 'creation_date': '2024-09-01', 'last_modified_date': '2024-09-01'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='76b99e54-a9fe-43bc-a546-67890ca06d34', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/root/llamaindex_demo/data/xtuner/README_zh-CN.md', 'file_name': 'README_zh-CN.md', 'file_type': 'text/markdown', 'file_size': 14846, 'creation_date': '2024-09-01', 'last_modified_date': '2024-09-01'}, hash='590aacf18932dfb1576bd23bc439af33d8c22fe0eff966e46d1c295a4aec1f9a')}, 
text='对话\n\nXTuner 提供与大语言模型对话的工具。\n\n\`\`\`shell\nxtuner chat ${NAME_OR_PATH_TO_LLM} --adapter {NAME_OR_PATH_TO_ADAPTER} [optional arguments]\n```\n\n例如：\n\n与 InternLM2.5-Chat-7B 对话：\n\n\`\`\`shell\nxtuner chat internlm/internlm2-chat-7b --prompt-template internlm2_chat\n```\n\n更多示例，请查阅文档。', mimetype='text/plain', start_char_idx=2, end_char_idx=264, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.4139588845586523)]

附1:没有RAG的问答调用示例

from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.llms import ChatMessage
llm = HuggingFaceLLM(
    model_name="/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
    tokenizer_name="/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
    model_kwargs={"trust_remote_code":True},
    tokenizer_kwargs={"trust_remote_code":True}
)

rsp = llm.chat(messages=[ChatMessage(content="xtuner是什么？")])
print(rsp)

附2:有RAG的问答调用示例

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

#初始化一个HuggingFaceEmbedding对象，用于将文本转换为向量表示
embed_model = HuggingFaceEmbedding(
#指定了一个预训练的sentence-transformer模型的路径
    model_name="/root/model/sentence-transformer"
)
#将创建的嵌入模型赋值给全局设置的embed_model属性，
#这样在后续的索引构建过程中就会使用这个模型。
Settings.embed_model = embed_model

llm = HuggingFaceLLM(
    model_name="/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
    tokenizer_name="/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
    model_kwargs={"trust_remote_code":True},
    tokenizer_kwargs={"trust_remote_code":True}
)
#设置全局的llm属性，这样在索引查询时会使用这个模型。
Settings.llm = llm

#从指定目录读取所有文档，并加载数据到内存中
documents = SimpleDirectoryReader("/root/llamaindex_demo/data/xtuner").load_data()
#创建一个VectorStoreIndex，并使用之前加载的文档来构建索引。
# 此索引将文档转换为向量，并存储这些向量以便于快速检索。
index = VectorStoreIndex.from_documents(documents)
# 创建一个查询引擎，这个引擎可以接收查询并返回相关文档的响应。
query_engine = index.as_query_engine()
response = query_engine.query("xtuner是什么?")

print(response)