手把手教你基于「向量数据库+LangChain」快速搭建智能问答系统(Zilliz Cloud、Milvus)
Question Answering over Documents with Zilliz Cloud and LangChain
本文章代码主要以第二篇文章为主,第一个视频为辅
如果运行中出现安装包的问题,可以试一试以下几个命令
pip install langchain
pip install -U langchain-community
pip install openai
pip install pymilvus
pip install -U langchain-openai
以下代码在jupyter notebook中运行
!python -m pip install --upgrade pymilvus langchain openai tiktoken
我这里用的是中转gpt4的模型embedding,中转gpt3.5问了一家没办法embedding
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.zilliz import Zilliz
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate
import os
# 1. Set up the name of the collection to be created.
COLLECTION_NAME = 'doc_qa_db'
# 2. Set up the dimension of the embeddings.
DIMENSION = 768
# 3. Set up the cohere api key
OPENAI_API_KEY = "sk-sy…………0a2e"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
# 4. Set up the connection parameters for your Zilliz Cloud cluster.
URI = 'https://in05-6……3.serverless.ali-cn-hangzhou.cloud.zilliz.com.cn'
# 5. Set up the token for your Zilliz Cloud cluster.
# You can either use an API key or a set of cluster username and password joined by a colon.
TOKEN = 'e……cda'
# Use the WebBaseLoader to load specified web pages into documents
loader = WebBaseLoader([
'https://milvus.io/docs/overview.md',
])
docs = loader.load()
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
all_splits = text_splitter.split_documents(docs)
我这里是中转,所以用了base_url参数
from langchain_openai import OpenAIEmbeddings
# 你的自定义基础URL
custom_base_url = 'https://api.xiaoai.plus/v1'
# 如果OpenAIEmbeddings类接受base_url参数
embeddings = OpenAIEmbeddings(base_url=custom_base_url)
connection_args = { 'uri': URI, 'token': TOKEN }
vector_store = Zilliz(
embedding_function=embeddings,
connection_args=connection_args,
collection_name=COLLECTION_NAME,
drop_old=True
).from_documents(
all_splits,
embedding=embeddings,
collection_name=COLLECTION_NAME,
connection_args=connection_args,
auto_id=True # 添加这一行
)
query = "What are the main components of Milvus?"
docs = vector_store.similarity_search(query)
print(len(docs))
我这里用回了gpt3.5
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=os.getenv("OPENAI_API_KEY"),
openai_api_base=os.getenv("OPENAI_API_BASE_URL"))
retriever = vector_store.as_retriever()
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt
| llm
)
print(rag_chain.invoke("Explain IVF_FLAT in Milvus."))