一、为什么需要Reranker
检索增强生成(RAG)是一种新兴的 AI 技术栈,通过为大型语言模型(LLM)提供额外的“最新知识”来增强其能力。
基本的 RAG 应用包括四个关键技术组成部分:
Embedding 模型:用于将外部文档和用户查询转换成 Embedding 向量
向量数据库:用于存储 Embedding 向量和执行向量相似性检索(检索出最相关的 Top-K 个信息)
提示词工程(Prompt engineering):用于将用户的问题和检索到的上下文组合成大模型的输入
大语言模型(LLM):用于生成回答
上述的基础 RAG 架构可以有效解决 LLM 产生“幻觉”、生成内容不可靠的问题。
二、Reranker介绍
Reranker 是信息检索(IR)生态系统中的一个重要组成部分,用于评估搜索结果,并进行重新排序,从而提升查询结果相关性。在 RAG 应用中,主要在拿到向量查询(ANN)的结果后使用 Reranker,能够更有效地确定文档和查询之间的语义相关性,更精细地对结果重排,最终提高搜索质量。
目前,Reranker 类型主要有两种——基于统计和基于深度学习模型的 Reranker:
基于统计的 Reranker 会汇总多个来源的候选结果列表,使用多路召回的加权得分或倒数排名融合(RRF)算法来为所有结果重新算分,统一将候选结果重排。这种类型的 Reranker 的优势是计算不复杂,效率高,因此广泛用于对延迟较敏感的传统搜索系统中。
基于深度学习模型的 Reranker,通常被称为 Cross-encoder Reranker。由于深度学习的特性,一些经过特殊训练的神经网络可以非常好地分析问题和文档之间的相关性。这类 Reranker 可以为问题和文档之间的语义的相似度进行打分。因为打分一般只取决于问题和文档的文本内容,不取决于文档在召回结果中的打分或者相对位置,这种 Reranker 既适用于单路召回也适用于多路召回。
三、Reranker实现
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
from typing import Any, List, Optional
from sentence_transformers import CrossEncoder
from typing import Optional, Sequence
from langchain_core.documents import Document
from langchain.callbacks.manager import Callbacks
from langchain.retrievers.document_compressors.base import BaseDocumentCompressor
from llama_index.bridge.pydantic import Field, PrivateAttr
这部分代码导入了必要的Python标准库以及第三方库和模块,例如 sys、os 和 typing。
sys.path.append() 用于将上级目录添加到Python解释器的搜索路径中,以便导入自定义模块。
CrossEncoder 是从 sentence_transformers 库中导入的模型,用于文本对的编码。
class LangchainReranker(BaseDocumentCompressor):
"""Document compressor that uses `Cohere Rerank API`."""
model_name_or_path: str = Field()
_model: Any = PrivateAttr()
top_n: int = Field()
device: str = Field()
max_length: int = Field()
batch_size: int = Field()
# show_progress_bar: bool = None
num_workers: int = Field()
LangchainReranker 类继承自 BaseDocumentCompressor,这个基类可能定义了一些文档压缩器的通用行为和方法。
类有几个属性:model_name_or_path、top_n、device、max_length、batch_size 和 num_workers,它们分别用于指定模型的名称或路径、返回的顶部文档数、使用的设备、文本的最大长度、批处理大小和并行工作者数量。
四、方法 compress_documents
def compress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Optional[Callbacks] = None,
) -> Sequence[Document]:
"""
Compress documents using Cohere's rerank API.
Args:
documents: A sequence of documents to compress.
query: The query to use for compressing the documents.
callbacks: Callbacks to run during the compression process.
Returns:
A sequence of compressed documents.
"""
if len(documents) == 0: # to avoid empty api call
return []
doc_list = list(documents)
_docs = [d.page_content for d in doc_list]
sentence_pairs = [[query, _doc] for _doc in _docs]
results = self._model.predict(sentences=sentence_pairs,
batch_size=self.batch_size,
# show_progress_bar=self.show_progress_bar,
num_workers=self.num_workers,
# activation_fct=self.activation_fct,
# apply_softmax=self.apply_softmax,
convert_to_tensor=True
)
top_k = self.top_n if self.top_n < len(results) else len(results)
values, indices = results.topk(top_k)
final_results = []
for value, index in zip(values, indices):
doc = doc_list[index]
doc.metadata["relevance_score"] = value
final_results.append(doc)
return final_results
————————————————
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
原文链接:https://blog.csdn.net/qq_65019236/article/details/139922348
五、主程序入口 __main__
if __name__ == "__main__":
from configs import (LLM_MODELS,
VECTOR_SEARCH_TOP_K,
SCORE_THRESHOLD,
TEMPERATURE,
USE_RERANKER,
RERANKER_MODEL,
RERANKER_MAX_LENGTH,
MODEL_PATH)
from server.utils import embedding_device
if USE_RERANKER:
reranker_model_path = MODEL_PATH["reranker"].get(RERANKER_MODEL, "BAAI/bge-reranker-large")
print("-----------------model path------------------")
print(reranker_model_path)
reranker_model = LangchainReranker(top_n=3,
device=embedding_device(),
max_length=RERANKER_MAX_LENGTH,
model_name_or_path=reranker_model_path
)
这部分代码在 __main__ 块中,用于当直接执行脚本时初始化并使用 LangchainReranker 类。
导入了必要的配置和工具函数,如模型路径、设备选择函数等。
根据 USE_RERANKER 的配置,如果需要使用重排序器,就初始化 LangchainReranker 实例 reranker_model,并传入相应的参数。
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
from typing import Any, List, Optional
from sentence_transformers import CrossEncoder
from typing import Optional, Sequence
from langchain_core.documents import Document
from langchain.callbacks.manager import Callbacks
from langchain.retrievers.document_compressors.base import BaseDocumentCompressor
from llama_index.bridge.pydantic import Field, PrivateAttr
class LangchainReranker(BaseDocumentCompressor):
"""Document compressor that uses `Cohere Rerank API`."""
model_name_or_path: str = Field()
_model: Any = PrivateAttr()
top_n: int = Field()
device: str = Field()
max_length: int = Field()
batch_size: int = Field()
# show_progress_bar: bool = None
num_workers: int = Field()
# activation_fct = None
# apply_softmax = False
def __init__(self,
model_name_or_path: str,
top_n: int = 3,
device: str = "cuda",
max_length: int = 1024,
batch_size: int = 32,
# show_progress_bar: bool = None,
num_workers: int = 0,
# activation_fct = None,
# apply_softmax = False,
):
# self.top_n=top_n
# self.model_name_or_path=model_name_or_path
# self.device=device
# self.max_length=max_length
# self.batch_size=batch_size
# self.show_progress_bar=show_progress_bar
# self.num_workers=num_workers
# self.activation_fct=activation_fct
# self.apply_softmax=apply_softmax
self._model = CrossEncoder(model_name=model_name_or_path, max_length=1024, device=device)
super().__init__(
top_n=top_n,
model_name_or_path=model_name_or_path,
device=device,
max_length=max_length,
batch_size=batch_size,
# show_progress_bar=show_progress_bar,
num_workers=num_workers,
# activation_fct=activation_fct,
# apply_softmax=apply_softmax
)
def compress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Optional[Callbacks] = None,
) -> Sequence[Document]:
"""
Compress documents using Cohere's rerank API.
Args:
documents: A sequence of documents to compress.
query: The query to use for compressing the documents.
callbacks: Callbacks to run during the compression process.
Returns:
A sequence of compressed documents.
"""
if len(documents) == 0: # to avoid empty api call
return []
doc_list = list(documents)
_docs = [d.page_content for d in doc_list]
sentence_pairs = [[query, _doc] for _doc in _docs]
results = self._model.predict(sentences=sentence_pairs,
batch_size=self.batch_size,
# show_progress_bar=self.show_progress_bar,
num_workers=self.num_workers,
# activation_fct=self.activation_fct,
# apply_softmax=self.apply_softmax,
convert_to_tensor=True
)
top_k = self.top_n if self.top_n < len(results) else len(results)
values, indices = results.topk(top_k)
final_results = []
for value, index in zip(values, indices):
doc = doc_list[index]
doc.metadata["relevance_score"] = value
final_results.append(doc)
return final_results
if __name__ == "__main__":
from configs import (LLM_MODELS,
VECTOR_SEARCH_TOP_K,
SCORE_THRESHOLD,
TEMPERATURE,
USE_RERANKER,
RERANKER_MODEL,
RERANKER_MAX_LENGTH,
MODEL_PATH)
from server.utils import embedding_device
if USE_RERANKER:
reranker_model_path = MODEL_PATH["reranker"].get(RERANKER_MODEL, "BAAI/bge-reranker-large")
print("-----------------model path------------------")
print(reranker_model_path)
reranker_model = LangchainReranker(top_n=3,
device=embedding_device(),
max_length=RERANKER_MAX_LENGTH,
model_name_or_path=reranker_model_path
)