RAG实战 12 - 在RAG管道中实现上下文压缩和过滤

  1. 使用某种基本的检索器来检索不同的信息;
  2. 然后将检索到的信息添加到文档压缩器中;
  3. 压缩器对这些信息进行过滤和处理,只提取对回答问题有用的信息。


  • 上下文压缩检索器将查询传递给基础检索器;
  • 然后,获取初始文档,并将它们传递给文档压缩器;
  • 文档压缩器获取文档列表,并通过减少文档内容或完全删除文档来缩短列表



Llmware BLING模型:大型语言模型(实验)

Chromadb : 向量数据库


4.1 安装所需的依赖项

!pip install -qU langchain huggingface_hub chromadb pypdf python-dotenv transformers sentence-transformers

4.2 导入需要的包

from langchain.llms import HuggingFaceHub
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

4.3 设置Huggingafechub token

import os
from getpass import getpass
os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass("Enter HuggingFace Hub Token:")

4.4 导入数据

loader = PyPDFLoader("/content/CommonInsuranceTerms.pdf")
documents = loader.load()

#### RESPONSE ####
Glossary of Common Insurance Terms 
NOTICE:  This document is for informational purposes only and is not in tended to alter or replace the 
insurance policy. Additionally, this informational sheet is not  intended to fully set out your rights and 
obligations or the rights and obligations of the insurance comp any. If you have questions about your insurance, 
you should consult your insurance agent, the insurance company,  or the language of the insurance policy. 
Accelerated death benefits  - An insurance policy with an accelerated death benefits provi sion will pay - 
under certain conditions - all or part of the policy death bene fits while the policyholder is still alive. These 
conditions include proof that the policyholder is terminally il l, has a specified life-thr eatening disease or is in a 
long-term care facility such as a nursing home. By accepting an  accelerated benefit payment, a person could be 
ruled ineligible for Medicaid or  other government benefits. The  proceeds may also be taxable. 
Accident  - An unforeseen, unintended event. 
Accident-only policies  - Policies that pay only in cas es arising from an accident or injury. 
Accidental death benefits  - If a life insurance policy includes an accidental death bene fit, the cause of death 
will be examined to determine whether the insured´s death meets  the policy´s definition of accidental. 
Actual cash value (ACV)  - The value of your property, based on the current cost to rep lace it minus 
depreciation. Also see "replacement cost." 
Additional livin g expenses (ALE)  - Reimburses the policyholder for the cost of temporary housin g, food, and 
other essential living expenses, if the home is damaged by a co vered peril that makes the home temporarily 
Adjuster  - An individual employed by an insurer to evaluate losses and settle policyholder claims.  
Administrative expense charge  - An amount deducted, usually monthly, from the policy. 
Agent  - A person who sells insurance policies. Must be licensed by t he Alabama Department of Insurance to 
legally sell and transact insurance business.  
Annuitant  - A person who receives the payments from an annuity during hi s or her lifetime.

4.5 设置文本切分器

text_splitter = RecursiveCharacterTextSplitter(chunk_size=700,chunk_overlap=70)
split_documents = text_splitter.split_documents(documents)

##### RESPONSE #####
Document(page_content='Glossary of Common Insurance Terms \nNOTICE:  This document is for informational purposes only and is not in tended to alter or replace the \ninsurance policy. Additionally, this informational sheet is not  intended to fully set out your rights and \nobligations or the rights and obligations of the insurance comp any. If you have questions about your insurance, \nyou should consult your insurance agent, the insurance company,  or the language of the insurance policy. \nA \nAccelerated death benefits  - An insurance policy with an accelerated death benefits provi sion will pay - \nunder certain conditions - all or part of the policy death bene fits while the policyholder is still alive. These', metadata={'source': '/content/CommonInsuranceTerms (1).pdf', 'page': 0})


4.6 设置嵌入

industry-bert-insurance-v0.1 是经过行业微调的sentence_transformer嵌入模型系列中的一个。

industry-bert-insurance-v0.1 是一个基于领域微调bert的768参数句子转换器模型,旨在作为保险行业领域嵌入的“替代品”。该模型是根据保险业的广泛公开文件进行培训的。

embeddings = SentenceTransformerEmbeddings(model_name="llmware/industry-bert-insurance-v0.1")

4.7 设置LLM

bling-shared-llama-1.3b-0.1是BLING(“Best Little Instruction-following No-GPU-required”)模型系列的一部分,在Sheared-LLaMA-1.3B基础模型上经过指令微调获得的。


BLING模型的具体介绍,可以参考 https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fmedium.com%2F%40darrenoberst%2Fsmall-instruct-following-llms-for-rag-use-case-54c55e4b41a8

repo_id ="llmware/bling-sheared-llama-1.3b-0.1"
llm = HuggingFaceHub(repo_id=repo_id, model_kwargs = {"temperature":0.3,"max_length":500}) 

4.8 打印文档的助手功能

def pretty_print_docs(docs):  
  print(f"\n{'-'* 100}\n".join([F"Document{i+1}:\n\n" + d.page_content for i,d in enumerate(docs)]))

4.9 设置矢量存储

vectorstore = Chroma.from_documents(split_documents,

4.10 设置检索器

retriever = vectorstore.as_retriever(search_kwargs={"k":2})

4.11 获取与查询匹配的相关上下文

docs = retriever.get_relevant_documents(query="What is Group life insurance?")

##### RESPONSE #####

or claim payment. Insurance companies also may have grievance p rocedures. 
Group life insurance  - This type of life insurance provides coverage to a group of people under one contract. 
Most group contracts are sold to businesses that w ant to provid e life insurance for their employees. Group life 
insurance can also be sold to associations to cover their membe rs and to lending institutions to cover the 
amounts of their debtor loans. Most group policies are for term  insurance. Generally, the business will be 
issued a master policy and each person in the group will receiv e a certificate of insurance. 
Group of companies  - Several insurance companies u nder common ownership and often  common 

Most group contracts are sold to businesses that w ant to provid e life insurance for their employees. Group life 
insurance can also be sold to associations to cover their membe rs and to lending institutions to cover the 
amounts of their debtor loans. Most group policies are for term  insurance. Generally, the business will be 
issued a master policy and each person in the group will receiv e a certificate of insurance. 
Group of companies  - Several insurance companies u nder common ownership and often  common 

4.12 使用LLMChainExtractor添加上下文压缩

  • 添加LLMChainExtractor以迭代最初返回的文档;
  • 仅从每个文档中提取与查询相关的上下文。
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
#making the compressor
compressor = LLMChainExtractor.from_llm(llm=llm)
#compressor retriver = base retriever + compressor
compression_retriever = ContextualCompressionRetriever(base_retriever=retriever,



###### RESPONSE #######
Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. 

Remember, *DO NOT* edit the extracted parts of the context.

> Question: {question}
> Context:
Extracted relevant parts:

4.13 向上下文压缩添加过滤器



###### RESPONSE #######
Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. 

Remember, *DO NOT* edit the extracted parts of the context.

> Question: {question}
> Context:
Extracted relevant parts:


from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm,
Ask Question
qa("What is Coinsurance?")

##### RESPONSE #####
> Entering new RetrievalQA chain...

> Finished chain.
{'query': 'What is Coinsurance?',
 'result': ' Coinsurance is the percentage of each health care bill a person must pay out of their own pocket. Non-covered charges and deductibles are in addition to this amount. Coinsurance maximum is the most you will have to pay in coinsurance during a policy period (usually a year) before your health plan begins paying 100 percent of the cost of your covered health services. The coinsurance maximum generally does not apply to copayments or other expenses you might be required to pay.\n\nC'}

qa("What is Group Life Insurance?")
###### RESPONSE #######
> Entering new RetrievalQA chain...

> Finished chain.
{'query': 'What is Group Life Insurance?',
 'result': ' Group life insurance provides coverage to a group of people under one contract. \nMost group contracts are sold to businesses that w ant to provid e life insurance for their employees. Group life \ninsurance can also be sold to associations to cover their membe rs and to lending institutions to cover the \namounts of their debtor loans. Most group policies are for term  insurance. Generally, the business will be \nissued a master policy and each person in the group will receiv e a certificate of insurance. \nGroup of companies  - Several insurance companies u nder common ownership and often  common \nmanagement'}







​ 在这里,我们创建了一个由冗余过滤器+相关过滤器组成的管道,其中冗余过滤器过滤掉重复的上下文,相关过滤器仅提取相关上下文。



from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings,k=5)
#making the pipeline
pipeline_compressor = DocumentCompressorPipeline(transformers=[redundant_filter,relevant_filter])
# compressor retriever
compression_retriever_pipeline = ContextualCompressionRetriever(base_retriever=retriever,
## print the prompt
## Get relevant documents
compressed_docs = compression_retriever_pipeline.get_relevant_documents(query="What is Coinsurance?")

##### RESPOSNE #####

Claimant  - A person who makes an insurance claim. 
Coinsurance  - The percentage of each health care bill a person must pay ou t of their own pocket. Non-covered 
charges and deductibles are in addition to this amount. 
Coinsurance maximum  - The most you will have to pay in coinsurance during a policy  period (usually a 
year) before your health plan begins paying 100 percent of the cost of your covered health services. The 
coinsurance maximum generally does not apply to copayments or o ther expenses you might be required to pay. 
Collision coverage  - Pays for damage to a car with out regard to who caused an acc ident. The company must

compressed_docs = compression_retriever_pipeline.get_relevant_documents(query="What is Earned premium?")

##### RESPOSNE #####

replacement cost or the actual cash value, which includes depre ciation. 
Replacement cost  - Insurance coverage that pays the dollar amount needed to rep lace the structure or 
damaged personal property without deducting for depreciation bu t limited by the policy's maximum dollar 
Rescission  - The termination of an insurance contract by the insurer when  material misrepresentation has 
Return premium  - A portion of the premium returned to a policy owner as a res ult of cancelation, rate 
adjustment, or a calculation that an advance premium was in exc ess of the actual premium.

compressed_docs = compression_retriever_pipeline.get_relevant_documents(query="What is Group Insurance Policy?")
##### RESPONSE #######

5.1 使用llmware/bling-sheared-llama-1.3b-0.1模型实现问答功能

​ 该模型用于生成短文本作为回复,主要有助于聊天机器人类型的应用程序,在这些应用程序中,我们不需要更长的回复。此外,与顶级Zephyr-beta-7b或Openai相比,这种LLM不会产生有效的响应,仅用于实验目的。使用正确的LLM可以增强生成的响应的正确性。


from langchain.prompts import PromptTemplate
template ="""


Use the above Context to answer the user's question.Consider only the Context provided above to formulate response.If the Question asked does not match with the Context provided just say 'I do not know thw answer'.

prompt = PromptTemplate(input_variables=["context","question"],template=template)
chain_type_kwargs = {"prompt":prompt}

from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm,
qa("What is Group Insurance Policy?")

###### RESPONSE ############
> Entering new RetrievalQA chain...

> Finished chain.
{'query': 'What is Group Insurance Policy?',
 'result': '<bot>: Group insurance policy is a policy that covers the life of a group of people. \nIt is usually sold to businesses that want to provide life insurance for their employees. \nIt can also be sold to associations to cover their members and to lending institutions to cover the amount of their debtor loans. \nMost group policies are for term insurance.<|endoftext|> Хронологија Хронологија Хронологија Хронологија Хронологија instanceof instanceof instanceof instanceof instanceof Хронологија Хронологија instanceof instanceof instanceof instanceofbolds Хронологија Narodowecka Narodowecka Narodowecka Narodowecka<|endoftext|><|endoftext|>bolds<|endoftext|>boldstrightarrow</boldsightarrow <|endoftext|></trightarrow',
 'source_documents': [_DocumentWithState(page_content='Most group contracts are sold to businesses that w ant to provid e life insurance for their employees. Group life \ninsurance can also be sold to associations to cover their membe rs and to lending institutions to cover the \namounts of their debtor loans. Most group policies are for term  insurance. Generally, the business will be \nissued a master policy and each person in the group will receiv e a certificate of insurance. \nGroup of companies  - Several insurance companies u nder common ownership and often  common \nmanagement.', metadata={'page': 5, 'source': '/content/CommonInsuranceTerms (1).pdf'}, state={'embedded_doc': [0.21783578395843506, 0.10463877022266388, -0.027319835498929024, -0.3879217505455017, 0.40784013271331787, 0.26043280959129333, 
 -0.4364112913608551, -0.06394162029027939, 0.059370845556259155, -0.10004513710737228, 0.41433295607566833, 0.02755933254957199, -0.29557839035987854, 0.8827390670776367], 'query_similarity_score': 0.5921743311683283})]}

response = qa("What is Long-term care benefits?")

###### RESPONSE ######
 Entering new RetrievalQA chain...

> Finished chain.
Long-term care benefits - Coverage that provides help for people when they are unable to care for themselves because of prolonged illness or disability. Benefits are triggered by specific findings of "cognitive impairment" or inability to perform certain actions known as "Activities of Daily Living." Benefits can range from help with daily activities while recuperating at home to skilled nursing care provided in a nursing home.

######### RESPONSE #######
{'query': 'What is Long-term care benefits?',
 'result': 'Long-term care benefits - Coverage that provides help for people when they are unable to care for themselves because of prolonged illness or disability. Benefits are triggered by specific findings of "cognitive impairment" or inability to perform certain actions known as "Activities of Daily Living." Benefits can range from help with daily activities while recuperating at home to skilled nursing care provided in a nursing home.<|endoftext|> Хронологија Хронологија Хронологија Хронологија instanceof instanceof instanceof instanceof instanceof instanceofboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsboldsbolds',
 'source_documents': [_DocumentWithState(page_content='Long-term care benefits  - Coverage that provides help f or people when they are unable to care for themselves \nbecause of prolonged illness or disability. Benefits are trigge red by specific findings of "cognitive impairment" \nor inability to perform certain actions known as "Activities of  Daily Living." Benefits  can range from help \nwith daily activities while recuperating at home to skilled nur sing care provided in a nursing home.', metadata={'page': 7, 'source': '/content/CommonInsuranceTerms (1).pdf'}, state={'embedded_doc': [0.10907948762178421, 0.09334351122379303, 0.41147854924201965, -0.3373779356479645, 0.4558241069316864, 0.5461165308952332, -0.052463024854660034, -0.43229037523269653, -0.06157633662223816, 0.09955484420061111, -0.6033256649971008, -0.05812996253371239, 0.7388888001441956, -0.011550833471119404, -0.5817828178405762, -0.08132432401180267, 
 -0.04400945082306862, 0.12993864715099335, 0.10278826206922531, -0.3340747654438019, 0.02115345373749733, -0.7229974269866943, 0.07934501022100449, -1.2799934148788452, -0.006999789737164974], 'query_similarity_score': 0.6116580581049508}),
  _DocumentWithState(page_content='because of prolonged illness or disability. Benefits are trigge red by specific findings of "cognitive impairment" \nor inability to perform certain actions known as "Activities of  Daily Living." Benefits  can range from help \nwith daily activities while recuperating at home to skilled nur sing care provided in a nursing home.', metadata={'page': 7, 'source': '/content/CommonInsuranceTerms (1).pdf'}, state={'embedded_doc': [0.2193884253501892, 0.3908122181892395, 0.014659926295280457, -0.2731555104255676, 0.2924192249774933, 0.6941999793052673, 0.2676321566104889, -0.45316028594970703, -0.018273770809173584, 0.05401374399662018, -0.4481824040412903, -0.008003979921340942, 0.7279651165008545, -0.04766766354441643, -0.6827098727226257, -0.015044593252241611, 0.2702178359031677, 
  0.24355649948120117, 0.014441099017858505, -0.16064004600048065, -0.51854407787323, -0.6001731157302856, 0.21660731732845306, 0.031298816204071045, 0.8995707035064697, 0.33247408270835876, 0.0323009230196476, -0.33297884464263916, -0.855561375617981, -0.6020331978797913, -0.4133726954460144, 0.13149243593215942, -0.20236220955848694, 0.3687146008014679, -0.08876236528158188, -0.2355964183807373, -0.7732605934143066, 0.13224034011363983, -1.1086665391921997, -0.11192735284566879], 'query_similarity_score': 0.5300057624305469})]}

5.2 创建新管道



compressor = LLMChainExtractor.from_llm(llm=OpenAI(temperature=0.3,openai_api_key=api_key))
new_pipeline = DocumentCompressorPipeline(transformers=[compressor,redundant_filter,relevant_filter])
new_compression_retriever = ContextualCompressionRetriever(base_retriever=retriever,
compressed_docs = new_compression_retriever.get_relevant_documents(query="What is Coinsurance?")

###### RESPONSE ##########

Coinsurance - The percentage of each health care bill a person must pay out of their own pocket. Coinsurance maximum - The most you will have to pay in coinsurance during a policy period (usually a year) before your health plan begins paying 100 percent of the cost of your covered health services.


from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm,
response = qa("What is Coinsurance?")

##### RESPONSE #######
  Finished chain.
 No, Coinsurance is the percentage of each health care bill a person must pay out of their own pocket.




[1] https://medium.aiplanet.com/implement-contextual-compression-and-filtering-in-rag-pipeline-4e9d4a92aa8f?gi=62283e44f70c&source=email-c63e4493b83d-1704307828713-digest.reader-edbc285dc84a-4e9d4a92aa8f----10-98------------------fdc06c81_5ca8_4867_b1e0_aa632ce3289c-1

