原文来自:
GitHub - langchain-ai/rag-from-scratch
根据langchain-ai的学习笔记
Overview
首先需要LangChain和OpenAI的两个API-Key
LangChain-API-Key
OpenAI-API-Key
初始化环境
- 将API-Key填到对应的地方
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = '<https://api.smith.langchain.com>'
os.environ['LANGCHAIN_API_KEY'] = '<填你的lang-chain的API-Key>'
os.environ["OPENAI_API_BASE"] = '<填你的openai的API>'
os.environ['OPENAI_API_KEY'] = '<填你的openai的API-Key>'
- 简单流程测试一下
整体分为三个部分:
- INDEXING
- RETRIEVAL
- GENERATION
后面的笔记会更细致的讲解这三个部分
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
#### INDEXING ####
# Load Documents
loader = WebBaseLoader(
web_paths=("<https://lilianweng.github.io/posts/2023-06-23-agent/>",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
# Embed
vectorstore = Chroma.from_documents(documents=splits,
embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
#### RETRIEVAL and GENERATION ####
# Prompt
prompt = hub.pull("rlm/rag-prompt")
# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# Post-processing
def format_docs(docs):
return "\\n\\n".join(doc.page_content for doc in docs)
# Chain
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Question
rag_chain.invoke("What is Task Decomposition?")
- 会得到结果
Task decomposition is the process of breaking down a complex task into smaller and simpler steps. This can be done using prompting techniques such as Chain of thought and Tree of Thoughts, or with task-specific instructions. The goal is to make the task more manageable and to shed light on the interpretation of the model's thinking process.