1.创建conda环境
conda create -n mylongchain python=3.9
conda activate mylongchain
pip install langchain==0.0.235 openai
2. 代码实践
1)hello longchain
我这里用的是通义千问的免费api
dashscope.api_key = "sk-" # 调用时需要手动添加key
from http import HTTPStatus
import json
import dashscope
def call_with_messages():
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'hello langchain'}]
response = dashscope.Generation.call(
dashscope.Generation.Models.qwen_turbo,
messages=messages,
result_format='message', # 将返回结果格式设置为 message
)
if response.status_code == HTTPStatus.OK:
# data = json.loads(response)
# 提取 content 的内容
content = response['output']['choices'][0]['message']['content']
print(content)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
call_with_messages()
2)可以调用huggingface的模型,也可以调用openai的
Langchain的支持众多模型供应商,包括OpenAI、ChatGLM、HuggingFace等。
1)huggingface
# 或者托管在Hugging Face 上的开源模型
# pip install huggingface_hub
from langchain import HuggingFaceHub
llm = HuggingFaceHub(repo_id = "google/flan-t5-xl")
# llm实例接受一个提示输入,返回字符串作为输出
prompt = u"中国的首都是?"
completion = llm(prompt)
2)openai
from langchain.chat_models import ChatOpenAI
from langchain.schema import AIMessage, HumanMessage, SystemMessage
chat = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
response = chat.predict_messages([
HumanMessage(content="What is AI?")
])
print(response)
3)数据连接
LangChain 的数据连接概念,通过提供以下组件,实现用户数据的加载、转换、存储和查询:
文档加载器:从不同的数据源加载文档
文档转换器:拆分文档,将文档转换为问答格式,去除冗余文档,等等
文本嵌入模型:将非结构化文本转换为浮点数数组表现形式,也称为向量
向量存储:存储和搜索嵌入数据(向量)
检索器:提供数据查询的通用接口
# ## 加载文档
from langchain.document_loaders import TextLoader
loader = TextLoader("./README.md")
docs = loader.load()
# ## 拆分文档
# ### 按字符拆分
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
separator = "\n\n",
chunk_size = 1000,
chunk_overlap = 200,
length_function = len,
)
split_docs = text_splitter.split_documents(docs)
print(len(docs[0].page_content))
for split_doc in split_docs:
print(len(split_doc.page_content))
# ### 拆分代码
# %%
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
PYTHON_CODE = """
def hello_langchain():
print("Hello, Langchain!")
# Call the function
hello_langchain()
"""
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON, chunk_size=50, chunk_overlap=0
)
python_docs = python_splitter.create_documents([PYTHON_CODE])
python_docs
# ### Markdown文档拆分
from langchain.text_splitter import MarkdownHeaderTextSplitter
markdown_document = "# Chapter 1\n\n ## Section 1\n\nHi this is the 1st section\n\nWelcome\n\n ### Module 1 \n\n Hi this is the first module \n\n ## Section 2\n\n Hi this is the 2nd section"
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
("###", "Header 3"),
]
splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
splits = splitter.split_text(markdown_document)
splits
# ### 按字符递归拆分
# %%
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 100,
chunk_overlap = 20,
length_function = len,
)
texts = text_splitter.split_documents(docs)
print(len(docs[0].page_content))
for split_doc in texts:
print(len(split_doc.page_content))
# ### 按token拆分
pip install -q tiktoken
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=100, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)
split_docs
# ## 向量化文档分块
from langchain.embeddings import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings(openai_api_key="")
embeddings = embeddings_model.embed_documents(
[
"你好!",
"Langchain!",
"你真棒!"
]
)
embeddings
# ## 向量数据存储
# ### 存储
pip install -q chromadb
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(docs)
db = Chroma.from_documents(documents, OpenAIEmbeddings(openai_api_key=""))
# ### 检索
query = "什么是WTF Langchain?"
docs = db.similarity_search(query)
docs
docs = db.similarity_search_with_score(query)
docs
4)提示词
LangChain
提供了一系列的类和函数,简化构建和处理提示词的过程。
- 提示词模板(Prompt Template):对提示词参数化,提高代码的重用性。
- 示例选择器(Example Selector):动态选择要包含在提示词中的示例
提示词模板
1)
from langchain import PromptTemplate
template = """
你精通多种语言,是专业的翻译官。你负责{src_lang}到{dst_lang}的翻译工作。
"""
prompt = PromptTemplate.from_template(template)
prompt.format(src_lang="英文", dst_lang="中文")
2)
multiple_input_prompt = PromptTemplate(
input_variables=["color", "animal"],
template="A {color} {animal} ."
)
multiple_input_prompt.format(color="black", animal="bear")
3)样本选择器
在LLM应用开发中,可能需要从大量样本数据中,选择部分数据包含在提示词中。样本选择器(Example Selector)正是满足该需求的组件,它也通常与少样本提示词配合使用。LangChain 提供了样本选择器的基础接口类 BaseExampleSelector,每个选择器类必须实现的函数为 select_examples。LangChain 实现了若干基于不用应用场景或算法的选择器:
LengthBasedExampleSelector
MaxMarginalRelevanceExampleSelector
NGramOverlapExampleSelector
SemanticSimilarityExampleSelector
本讲以基于长度的样本选择器(输入越长,选择的样本越少;输入越短,选择的样本越多)LengthBasedExampleSelector 为例,演示用法。
from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector
# These are a lot of examples of a pretend task of creating antonyms.
examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
# 可选的样本数据
examples=examples,
# 提示词模版
example_prompt=example_prompt,
# 格式化的样本数据的最大长度,通过get_text_length函数来衡量
max_length=25
)
dynamic_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# 输入量极小,因此所有样本数据都会被选中
print(dynamic_prompt.format(adjective="big"))
5)输出解析器
# %% [markdown]
# # 05 输出解析器
#
# LLM的输出为文本,但在程序中除了显示文本,可能希望获得更结构化的数据。这就是输出解析器(Output Parsers)的用武之地。
# %%
!pip install -q langchain==0.0.235 openai
# %% [markdown]
# ## List Parser
#
# List Parser将逗号分隔的文本解析为列表。
# %%
from langchain.output_parsers import CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()
output_parser.parse("black, yellow, red, green, white, blue")
# %% [markdown]
# ## Structured Output Parser
#
# 当我们想要类似JSON数据结构,包含多个字段时,可以使用这个输出解析器。该解析器可以生成指令帮助LLM返回结构化数据文本,同时完成文本到结构化数据的解析工作。
# %%
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
# 定义响应的结构(JSON),两个字段 answer和source。
response_schemas = [
ResponseSchema(name="answer", description="answer to the user's question"),
ResponseSchema(name="source", description="source referred to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
# 获取响应格式化的指令
format_instructions = output_parser.get_format_instructions()
format_instructions
# %%
# partial_variables允许在代码中预填充提示此模版的部分变量。这类似于接口,抽象类之间的关系
prompt = PromptTemplate(
template="answer the users question as best as possible.\n{format_instructions}\n{question}",
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)
model = OpenAI(temperature=0, openai_api_key="您的有效openai api key")
response = prompt.format_prompt(question="Who is the CEO of Tesla?")
output = model(response.to_string())
output_parser.parse(output)
# %% [markdown]
# ## 自定义输出解析器
#
# 扩展CommaSeparatedListOutputParser,让其返回的列表是经过排序的。
# %%
from typing import List
class SortedCommaSeparatedListOutputParser(CommaSeparatedListOutputParser):
def parse(self, text: str) -> List[str]:
lst = super().parse(text)
return sorted(lst)
output_parser = SortedCommaSeparatedListOutputParser()
output_parser.parse("black, yellow, red, green, white, blue")
6)链
llm = OpenAI(model_name="gpt-3.5-turbo",temperature=0, openai_api_key="sk-")
# %% [markdown]
# <a href="https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/06_Chains/06_Chains.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# %%
!pip install langchain==0.0.235 openai --quiet
# %%
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
llm = OpenAI(temperature=0, openai_api_key="您的有效openai api key")
prompt = PromptTemplate(
input_variables=["color"],
template="What is the hex code of color {color}?",
)
# %%
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)
# %%
print(chain.run("green"))
print(chain.run("cyan"))
print(chain.run("magento"))
# %%
from langchain.chains import load_chain
import os
os.environ['OPENAI_API_KEY'] = "您的有效openai api key"
chain = load_chain("lc://chains/llm-math/chain.json")
# %%
chain.run("whats the area of a circle with radius 2?")
from langchain.chains import LLMChain
chain = LLMChain(llm = llm, prompt = prompt)
chain.run("大狗")
from langchain.chains import LLMChain, SimpleSequentialChain
# 前面的代码定义了第一个chain
# ...
# 创建第二个chain
second_prompt = PromptTemplate(
input_variables=["petname"],
template="写一篇关于{petname}的小诗。",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)
# 将两个chain串联在一起
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)
# 只需给出源头输入结合顺序运行整个chain
catchphrase = overall_chain.run("大狗")
7)记忆组件
大多数LLM应用都具有对话界面。对话的一个重要组成部分是对话历史中的信息。我们将这种存储对话历史中的信息的能力称为"记忆"。LangChain
提供了一系列记忆相关的实用工具。这些工具可以单独使用,也可以无缝地集成到一条链中。
# %% [markdown]
# <a href="https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/07_Memory/07_Memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# %%
import os
os.environ['OPENAI_API_KEY'] = ''
# %%
!pip install langchain==0.0.235 --quiet
# %%
from langchain.memory import ConversationBufferMemory
# %%
memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})
# %%
memory.chat_memory.messages
# %%
memory.load_memory_variables({})
# %%
memory = ConversationBufferMemory(return_messages=True)
memory.save_context({"input": "hi"}, {"output": "whats up"})
# %%
memory.load_memory_variables({})
# %%
from langchain.memory import ConversationBufferWindowMemory
# %%
memory = ConversationBufferWindowMemory( k=1)
memory.save_context({"input": "Hi, LangChain!"}, {"output": "Hey!"})
memory.save_context({"input": "Where are you?"}, {"output": "By your side"})
# %%
memory.load_memory_variables({})
# %%
memory.chat_memory.messages
# %%
!pip install openai
# %%
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
template = """You are a chatbot having a conversation with a human.
{conversation_history}
Human: {input}
Chatbot:"""
prompt = PromptTemplate(
input_variables=["conversation_history", "input"], template=template
)
memory = ConversationBufferMemory(memory_key="conversation_history")
llm_chain = LLMChain(
llm=OpenAI(),
prompt=prompt,
verbose=True,
memory=memory,
)
# %%
llm_chain.predict(input="Where is Paris?")
# %%
llm_chain.predict(input="What did I just ask?")
from langchain.memory import ConversationSummaryMemory
from langchain.llms import OpenAI
memory = ConversationSummaryMemory(llm=OpenAI(temperature=0, openai_api_key="您的有效openai api key"))
memory.save_context({"input": "Hi, LangChain!"}, {"output": "Hey!"})
memory.save_context({"input": "How to start with Next.js development?"}, {"output": "You can get started with its official developer guide."})
memory.save_context({"input": "Show me the link of the guide."}, {"output": "I'm looking for you now. Please stand by!"})
memory.load_memory_variables({})
8)代理 (Agent)
它的核心思想是利用一个语言模型来选择一系列要执行的动作。LangChain
的链将一系列的动作硬编码在代码中。而在 Agent
中,语言模型被用作推理引擎,来确定应该执行哪些动作以及以何种顺序执行。
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo",temperature=0, openai_api_key="sk-")
tools = load_tools(["ddg-search", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("Has Eason Chan been injured?")
维基百科搜索
# pip install wikipedia
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True)
agent.run("奥巴马的生日是哪天? 到2023年他多少岁了?")
9)回调
Callback
是 LangChain
提供的回调机制,允许我们在 LLM
应用程序的各个阶段使用 Hook
(钩子)。这对于记录日志、监控、流式传输等任务非常有用。这些任务的执行逻辑由回调处理器(CallbackHandler
)定义。
在 Python
程序中, 回调处理器通过继承 BaseCallbackHandler
来实现。BaseCallbackHandler
接口对每一个可订阅的事件定义了一个回调函数。BaseCallbackHandler
的子类可以实现这些回调函数来处理事件。当事件触发时,LangChain
的回调管理器 CallbackManager
会调用相应的回调函数。
# %% [markdown]
# <a href="https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/09_Callbacks/09_Callbacks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# %% [markdown]
# # LangChain Callback 示例
# %% [markdown]
# ## 准备环境
# %% [markdown]
# 1. 安装langchain版本0.0.235,以及openai
# %%
!pip install -q -U langchain==0.0.235 openai
# %% [markdown]
# 2. 设置OPENAI API Key
# %%
import os
os.environ['OPENAI_API_KEY'] = "您的有效openai api key"
# %% [markdown]
# ## 示例代码
# %% [markdown]
# 1. 内置回调处理器 `StdOutCallbackHandler`
# %%
from langchain.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("Who is {name}?")
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
chain.run(name="Super Mario")
# %% [markdown]
# 2. 自定义回调处理器
#
# 我们来实现一个处理器,统计每次 `LLM` 交互的处理时间。
# %%
from langchain.callbacks.base import BaseCallbackHandler
import time
class TimerHandler(BaseCallbackHandler):
def __init__(self) -> None:
super().__init__()
self.previous_ms = None
self.durations = []
def current_ms(self):
return int(time.time() * 1000 + time.perf_counter() % 1 * 1000)
def on_chain_start(self, serialized, inputs, **kwargs) -> None:
self.previous_ms = self.current_ms()
def on_chain_end(self, outputs, **kwargs) -> None:
if self.previous_ms:
duration = self.current_ms() - self.previous_ms
self.durations.append(duration)
def on_llm_start(self, serialized, prompts, **kwargs) -> None:
self.previous_ms = self.current_ms()
def on_llm_end(self, response, **kwargs) -> None:
if self.previous_ms:
duration = self.current_ms() - self.previous_ms
self.durations.append(duration)
# %%
llm = OpenAI()
timerHandler = TimerHandler()
prompt = PromptTemplate.from_template("What is the HEX code of color {color_name}?")
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[timerHandler])
response = chain.run(color_name="blue")
print(response)
response = chain.run(color_name="purple")
print(response)
timerHandler.durations
# %% [markdown]
# 3. `Model` 与 `callbacks`
#
# `callbacks` 可以在构造函数中指定,也可以在执行期间的函数调用中指定。
#
# 请参考如下代码:
# %%
timerHandler = TimerHandler()
llm = OpenAI(callbacks=[timerHandler])
response = llm.predict("What is the HEX code of color BLACK?")
print(response)
timerHandler.durations
# %%
timerHandler = TimerHandler()
llm = OpenAI()
response = llm.predict("What is the HEX code of color BLACK?", callbacks=[timerHandler])
print(response)
timerHandler.durations
10)一个完整的例子
注意python库版本号
pip install chromadb==0.4.15
# %% [markdown]
# <a href="https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/10_Example/10_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# %% [markdown]
# 1. 安装必要的 `Python` 包
# %%
!pip install -q langchain==0.0.235 openai chromadb==0.4.15 pymupdf tiktoken
# %% [markdown]
# 2. 设置OpenAI环境
# %%
import os
os.environ['OPENAI_API_KEY'] = ''
# %% [markdown]
# 3. 下载PDF文件AWS Serverless Developer Guide
# %%
!wget https://docs.aws.amazon.com/pdfs/serverless/latest/devguide/serverless-core.pdf
PDF_NAME = 'serverless-core.pdf'
# %% [markdown]
# 4. 加载PDF文件
# %%
from langchain.document_loaders import PyMuPDFLoader
docs = PyMuPDFLoader(PDF_NAME).load()
print (f'There are {len(docs)} document(s) in {PDF_NAME}.')
print (f'There are {len(docs[0].page_content)} characters in the first page of your document.')
# %% [markdown]
# 5. 拆分文档并存储文本嵌入的向量数据
# %%
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(split_docs, embeddings, collection_name="serverless_guide")
# %% [markdown]
# 6. 基于OpenAI创建QA链
# %%
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
llm = OpenAI(model_name="gpt-3.5-turbo",temperature=0, openai_api_key="sk-")
chain = load_qa_chain(llm, chain_type="stuff")
# %% [markdown]
# 7. 基于提问,进行相似性查询
# %%
query = "What is the use case of AWS Serverless?"
similar_docs = vectorstore.similarity_search(query, 3, include_metadata=True)
# %%
similar_docs
# %% [markdown]
# 8. 基于相关文档,利用QA链完成回答
# %%
chain.run(input_documents=similar_docs, question=query)