1.RAG&LLM 从零学习笔记

Moxean

已于 2024-06-21 12:15:25 修改

阅读量243

点赞数 10

分类专栏：从零开始RAG 文章标签：学习笔记

于 2024-06-21 00:27:04 首次发布

本文链接：https://blog.csdn.net/weixin_47059517/article/details/139845420

版权

从零开始RAG 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

原文来自：

GitHub - langchain-ai/rag-from-scratch

根据langchain-ai的学习笔记

Overview

首先需要LangChain和OpenAI的两个API-Key

LangChain-API-Key

LangSmith

OpenAI-API-Key

OpenAI Platform

初始化环境

将API-Key填到对应的地方

import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = '<https://api.smith.langchain.com>'
os.environ['LANGCHAIN_API_KEY'] = '<填你的lang-chain的API-Key>'
os.environ["OPENAI_API_BASE"] = '<填你的openai的API>'
os.environ['OPENAI_API_KEY'] = '<填你的openai的API-Key>'

简单流程测试一下

整体分为三个部分：

INDEXING
RETRIEVAL
GENERATION

后面的笔记会更细致的讲解这三个部分

import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

#### INDEXING ####

# Load Documents
loader = WebBaseLoader(
    web_paths=("<https://lilianweng.github.io/posts/2023-06-23-agent/>",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Embed
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

#### RETRIEVAL and GENERATION ####

# Prompt
prompt = hub.pull("rlm/rag-prompt")

# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Post-processing
def format_docs(docs):
    return "\\n\\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Question
rag_chain.invoke("What is Task Decomposition?")

会得到结果

Task decomposition is the process of breaking down a complex task into smaller and simpler steps. This can be done using prompting techniques such as Chain of thought and Tree of Thoughts, or with task-specific instructions. The goal is to make the task more manageable and to shed light on the interpretation of the model's thinking process.