推荐系统指标——MAP和NDCG(Java 代码)

最新推荐文章于 2024-03-20 21:04:45 发布

RebeccaCute

最新推荐文章于 2024-03-20 21:04:45 发布

阅读量868

点赞数

文章标签：推荐系统

本文链接：https://blog.csdn.net/rebecca1809/article/details/121408513

版权

这两个指标均为评估推荐列表质量的指标。

1. MAP（Mean Average Precision）

public double map(Triple[][] paraTrainingMatrix, Triple[][] paraTestingMatrix, int paraNumItems, int paraK) {
		int[] rankList = new int[paraK];
		int tempNumUsers = paraTrainingMatrix.length;
		int groundTruthItem = 0;
		double resultMap = 0;
		double mapForEachUser = 0;

		for (int i = 0; i < tempNumUsers; i++) {
			mapForEachUser = 0;
			if(paraTestingMatrix[i].length == 0) {
				continue;
			}//Of if
            int numOfTestingItemsInRankList = 0;
			for (int j = 0; j < paraTestingMatrix[i].length; j++) {
  // an item in testing set is consider as the ground-truth item (gtItem).
				groundTruthItem = paraTestingMatrix[i][j].item;
//computeRankList： compute a rank list for the gtItem.
//1. compute the predicted ratings for all items (donot exclude the items in the training set).
//2. compute the number of items whose predicted ratings above the gtItem. 
//3. if the numbers is larger than paraK, the gtItem cannot rank in the Top-K;
// else output the rankList. 

			rankList = computeRankList(i, paraNumItems, paraK, groundTruthItem);
				if (rankList == null) {
					mapForEachUser += 0;
					continue;
				} // Of if
// find the ranking of the gtItem in the rankList and compute the precision.
				for (int paraRank = 0; paraRank < rankList.length; paraRank++) {
					if (groundTruthItem == rankList[paraRank]) {
                         numOfTestingItemsInRankList ++;
						mapForEachUser += numOfTestingItemsInRankList / (paraRank + 1);
						break;
					} // Of if
				} // Of for paraRank
			} // Of for j
			mapForEachUser /= paraTestingMatrix[i].length;
			System.out.println("mapForEachUser" + mapForEachUser);
			resultMap += mapForEachUser;
		} // Of i
		resultMap /= tempNumUsers;
		return resultMap;
	}// Of map

$\text{MAP}@K = \frac{1}{m}\sum_{i \in U}\frac{1}{|\Omega{te}_{i}|}\sum_{j = 1}^{K}\frac{|\{p | \Gamma_{ip} \in \Omega{te}_{i}^{+}, p \le j\}|}{j}.$

其中， $K$ 为推荐列表的长度， $m$ 为用户数目， $i$ 为用户下标， $\Delta_{i}^+$ 为用户喜欢的物品集合 $\Pi_{i}$ 与推荐列表 $\Gamma_{i}$ 的交集。

其中， $m$ 为用户数目, $K$ 为推荐列表的长度， $i$ 为用户下标， $\Omega{te}_{i}$ 为用户 $i$ 在测试集中的物品的集合。右侧的分式的含义是，对于推荐列表的第 $j$ 个物品如果在 $\Omega{te}_{i}$ 中，其Precision是：

分母是它的名次 $j$ ；

分子是排在该物品之前的物品中，有多少个属于 $\Omega{te}_{i}$ 。

2. NDCG（Normalized discounted cumulative gain）

$\text{NDCG}@K = \frac{1}{m}\sum_{i \in U}\frac{\text{DCG}_{i}@K}{\text{IDCG}@K}.$

分子： $\text{DCG}_{i}@K =\sum_{j=1}^K \frac{2^{rel(\Gamma_{ij})} -1}{\text{log}_2(j + 1)}$

其中， $rel(\Gamma_{ij})$ 是 $\Gamma_{i}$ 中第 $j$ 个物品的实际评分，作为relevance。

分母： $\text{IDCG}_{i}$ 的计算式与 $\text{DCG}_{i}$ 相同。需要修改的是是将推荐列表的物品按照预测评分由高到低排序，即修改 $rel(\Gamma_{ij})$ 。 $\text{IDCG}_{i}$ 的 $\text{I}$ 含义就是理想的，给出的 $\Gamma_{i}$ 本身就是按照预测评分由高到低排序，也希望该推荐列表能与用户实际的喜好一致。从而体现出，推荐列表的排序与用户真实喜好的关联性。