机器学习指标
介绍(Introduction)
In the first part of this post, I provided an introduction to 10 metrics used for evaluating classification and regression models. In this part, I am going to provide an introduction to the metrics used for evaluating models developed for ranking (AKA learning to rank), as well as metrics for statistical models. In particular, I will cover the talk about the below 5 metrics:
在本文的第一部分中,我介绍了用于评估分类和回归模型的10个指标。 在这一部分中,我将介绍用于评估为排名而开发的模型(即AKA学习排名)的度量以及统计模型的度量。 特别是,我将讨论以下5个指标:
Mean reciprocal rank (MRR)
平均倒数排名(MRR)
Precision at k
k精度
DCG and NDCG (normalized discounted cumulative gain)
DCG和NDCG(归一化折现累计收益)
Pearson correlation coefficient
皮尔逊相关系数
Coefficient of determination (R²)
测定系数(R²)
排名相关指标 (Ranking Related Metrics)
Ranking is a fundamental problem in machine learning, which tries to rank a list of items based on their relevance in a particular task (e.g. ranking pages on Google based on their relevance to a given query). It has a wide range of applications in E-commerce, and search engines, such as:
排名是在机器学习,它试图排名根据他们在一个特定的任务相关的项目列表的一个基本问题(根据相关度给定查询上谷歌排名例如页)。 它在电子商务和搜索引擎中具有广泛的应用,例如:
Movie recommendation (as in Netflix, and YouTube),
电影推荐(如Netflix和YouTube ),
Page ranking on Google,
在Google上的网页排名,
Ranking E-commerce products on Amazon,
在亚马逊上对电子商务产品进行排名,
- Query auto-completion, 查询自动完成
Image search on vimeo,
在vimeo上进行图片搜索,
Hotel search on Expedia/Booking.
在Expedia /预订上搜索酒店。
In learning to rank problem, the model tries to predict the rank (or relative order) of a list of items for a given task¹. The algorithms for ranking problem can be grouped into:
在学习排名问题时,模型会尝试预测给定任务¹的项目列表的排名(或相对顺序)。 排名问题的算法可以分为:
Point-wise models: which try to predict a (matching) score for each query-document pair in the dataset, and use it for ranking the items.
逐点模型:尝试为数据集中的每个查询文档对预测一个(匹配)分数,并将其用于对项目进行排名。
Pair-wise models: which try to learn a binary classifier that can tell which document is more relevant to a query, given pair of documents.
逐对模型:在给定的成对文档中,尝试学习二进制分类器,该分类器可以告诉哪个文档与查询更相关。
List-wise models: which try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data.
列表式模型:尝试直接优化上述评估方法之一的价值,对培训数据中的所有查询取平均值。
During evaluation, given the ground-truth order of the list of items for several queries, we want to know how good the predicted order of those list of items is.
在评估过程中,给定几个查询的项目列表的真实顺序,我们想知道这些项目列表的预测顺序有多好。
There are various metrics proposed for evaluating ranking problems, such as:
建议使用各种度量标准来评估排名问题,例如:
- MRR MRR
- Precision@ K 精度@ K
- DCG & NDCG DCG和NDCG
- MAP 地图
- Kendall’s tau肯德尔的牛头
- Spearman’s rho斯皮尔曼的罗
In this post, we focus on the first 3 metrics above, which are the most popular metrics for ranking problem.
在本文中,我们重点介绍上面的前3个指标,它们是排名问题中最受欢迎的指标。
Some of these metrics may be very trivial, but I decided to cover