Datawhale 动手学大模型应用开发第四五章笔记向量数据库 / Prompt / 检索chain / 记忆

本文链接：https://blog.csdn.net/qq_46231296/article/details/134541644

Ch4&5 加载向量数据库 / Prompt / 检索chain / 记忆

Ch4 & 5 思维导图
第四章加载向量数据库
第五章 Prompt，检索chain，记忆
仓库project相关代码解析

Ch4 & 5 思维导图

在这里插入图片描述

第四章加载向量数据库

第一节是介绍不同格式数据的加载，分块，进一步的embedding方式，
第二节是介绍Chroma向量数据库，通过传入文档列表构建一个向量库，
然后之后可以根据query索引出topk块(similarity_search和max_marginal_relevance_search)。
最后介绍了进一步介绍 langchain检索问答链 ( RetrievalQA.from_chain_type(llm, retriever=vectordb.as_retriever()) )
第三节则是给出了构建本知识库应用项目的向量数据库整体构建代码
思维导图很清晰地说明了第四章的脉络，可以多看看

第五章 Prompt，检索chain，记忆

本项目作者邹雨衡在其之前的另一个项目datawhale/prompt-engineering-for-developers第一章对提示工程prompt-engineering有更多的教学，想了解更多内容的同学们可以去看之前那个项目仓库或者去看吴恩达老师的课程《ChatGPT Prompt Engineering for Developers》。

第一节 Prompt技巧原则

原则一: 指令应该清晰且具体

1.1 使用分隔符将指令和材料分开

原因：如果下面这句话没有分隔符，这句话本身就是有歧义的。

将 下面这句被三重双引号包围的句子 翻译成英文。
””“将这句中文转成法文”””

良好的prompt

prompt = f"""
Summarize the text delimited by triple backticks into a single sentence.
```{text}```
"""

1.2 规定输出格式 (规定结构 or 答案空间)

良好的prompt

prompt = f"""
Generate a list of three made-up book titles along with their authors and genres. 
Provide them in JSON format with the following keys: 
book_id, title, author, genre.
"""

1.3 让模型自我检查

良好的prompt

prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text}\"\"\"
"""

1.4 给出示例(ICL，让模型自己学到输入输出风格)

良好的prompt

prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \ 
valley flows from a modest spring; the \ 
grandest symphony originates from a single note; \ 
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.
"""

原则二: 给模型时间去思考

2.1 给出步骤(Think step by step)

良好的prompt

prompt_1 = f"""
Perform the following actions: 
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the following \
keys: french_summary, num_names.

Separate your answers with line breaks.

Text:
```{text}```
"""

2.2 让模型下结论前，自己重头思考

良好的prompt

prompt = f"""
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ 
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""

第二节完整的检索问答链条RetrievalQA

2.1 核心代码

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
template = """使用以下上下文来回答最后的问题。如果你不知道答案，就说你不知道，不要试图编造答\
案。最多使用三句话。尽量使答案简明扼要。总是在回答的最后说“谢谢你的提问！”。
上下文: {context}
问题: {question}
有用的回答:"""
QA_CHAIN_PROMPT = PromptTemplate(
		input_variables=["context","question"], template=template
)
####-----传入加载好的向量数据库和template即可
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt":QA_CHAIN_PROMPT})

question_1 = "什么是南瓜书？"
result_with_RetrievalChain = qa_chain({"query": question_1})
print('有向量数据库技术支持的llm回答:\n', result["result"])
print('无向量数据库技术支持的llm回答:\n', llm(question_1))

第三节记忆历史对话

3.1 核心代码

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",  # 与 prompt 的输入变量保持一致。
    return_messages=True  # 将以消息列表的形式返回聊天记录，而不是单个字符串
)
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)
question = "我可以学习到关于强化学习的知识吗？"
result = qa({"question": question})
print(result['answer'])
# 是的，根据提供的上下文，这门课程会教授关于强化学习的知识。
question = "为什么这门课需要教这方面的知识？"
result = qa({"question": question})
print(result['answer'])
# 这门课需要教授关于强化学习的知识，是因为强化学习是一种用来学习如何做出一系列好的决策的方法。在人工智能领域，强化学习的应用非常广泛，可以用于控制机器人、实现自动驾驶、优化推荐系统等。学习强化学习可以帮助我们理解和应用这一领域的核心算法和方法，从而更好地解决实际问题。

仓库project相关代码解析

# 第七章第一节有结构简介，之后随着task5摆上来

Datawhale 动手学大模型应用开发 第四五章笔记 向量数据库 / Prompt / 检索chain / 记忆

Ch4&5 加载向量数据库 / Prompt / 检索chain / 记忆

Ch4 & 5 思维导图

第四章 加载向量数据库

第五章 Prompt，检索chain，记忆

第一节 Prompt技巧原则

原则一: 指令应该清晰且具体

1.1 使用分隔符 将 指令 和 材料 分开

1.2 规定输出格式 (规定结构 or 答案空间)

1.3 让模型自我检查

1.4 给出示例(ICL，让模型自己学到输入输出风格)

原则二: 给模型时间去思考

2.1 给出步骤(Think step by step)

2.2 让模型下结论前，自己重头思考

第二节 完整的检索问答链条RetrievalQA

2.1 核心代码

第三节 记忆历史对话

3.1 核心代码

仓库project相关代码解析

Datawhale 动手学大模型应用开发第四五章笔记向量数据库 / Prompt / 检索chain / 记忆

第四章加载向量数据库

1.1 使用分隔符将指令和材料分开

第二节完整的检索问答链条RetrievalQA

第三节记忆历史对话