日撸 Java 三百行学习笔记day54-55

第 54 天: 基于 M-distance 的推荐

所谓 M-distance, 就是根据平均分来计算两个用户 (或项目) 之间的距离.

初看还是不太懂,经过老师讲解论文里面的实例,现在也比较清晰了对于程序内容。

当然代码里的各种成员变量啊,方法啊也是不少。

/**
	 ************************* 
	 * Construct the rating matrix.
	 * 
	 * @param paraRatingFilename
	 *            the rating filename.
	 * @param paraNumUsers
	 *            number of users
	 * @param paraNumItems
	 *            number of items
	 * @param paraNumRatings
	 *            number of ratings
	 ************************* 
	 */
	public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
		// Step 1. Initialize these arrays
		numItems = paraNumItems;
		numUsers = paraNumUsers;
		numRatings = paraNumRatings;

		userDegrees = new int[numUsers];
		userStartingIndices = new int[numUsers + 1];
		userAverageRatings = new double[numUsers];
		itemDegrees = new int[numItems];
		compressedRatingMatrix = new int[numRatings][3];
		itemAverageRatings = new double[numItems];

		predictions = new double[numRatings];

		System.out.println("Reading " + paraFilename);

		// Step 2. Read the data file.
		File tempFile = new File(paraFilename);
		if (!tempFile.exists()) {
			System.out.println("File " + paraFilename + " does not exists.");
			System.exit(0);
		} // Of if
		BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
		String tempString;
		String[] tempStrArray;
		int tempIndex = 0;
		userStartingIndices[0] = 0;
		userStartingIndices[numUsers] = numRatings;
		while ((tempString = tempBufReader.readLine()) != null) {
			// Each line has three values
			tempStrArray = tempString.split(",");
			compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
			compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
			compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);

			userDegrees[compressedRatingMatrix[tempIndex][0]]++;
			itemDegrees[compressedRatingMatrix[tempIndex][1]]++;

			if (tempIndex > 0) {
				// Starting to read the data of a new user.
				if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
					userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
				} // Of if
			} // Of if
			tempIndex++;
		} // Of while
		tempBufReader.close();

		double[] tempUserTotalScore = new double[numUsers];
		double[] tempItemTotalScore = new double[numItems];
		for (int i = 0; i < numRatings; i++) {
			tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
			tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];
		} // Of for i

		for (int i = 0; i < numUsers; i++) {
			userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
		} // Of for i
		for (int i = 0; i < numItems; i++) {
			itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
		} // Of for i
	}// Of the first constructor

第一个构造器的内容还是比较多,先是传入各种参数,新建各种一维,二维数组。按行依次读入,一列一列的存储,统计了userDegrees和itemDegrees的数量,并且还考虑到userStartingIndices[]这个数组。最后还分别算出了平均值。

这里我们再来看计算MAE,RSME:

/**
	 ************************* 
	 * Compute the MAE based on the deviation of each leave-one-out.
	 * 
	 * @author Fan Min
	 ************************* 
	 */
	public double computeMAE() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		return tempTotalError / predictions.length;
	}// Of computeMAE

	/**
	 ************************* 
	 * Compute the MAE based on the deviation of each leave-one-out.
	 * 
	 * @author Fan Min
	 ************************* 
	 */
	public double computeRSME() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
					* (predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		double tempAverage = tempTotalError / predictions.length;

		return Math.sqrt(tempAverage);
	}// Of computeRSME

其中computeMAE()算的是绝对距离的平均值,computeRSME()就更像欧式距离,是差距平方和的平均值,最后还开根号了。其中这个predictions.length的大小前面有给出,predictions = new double[numRatings];看主函数传入的参数也能知道就是说评价值为0 的没有计入。

最后就是重中之重,leaveOneOutPrediction()的内容:

/**
	 ************************* 
	 * Leave-one-out prediction. The predicted values are stored in predictions.
	 * 
	 * @see predictions
	 ************************* 
	 */
	public void leaveOneOutPrediction() {
		double tempItemAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
					/ (itemDegrees[tempItem] - 1);

			// Step 2. Recompute neighbors, at the same time obtain the ratings
			// Of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;
			for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
				tempComparedItem = compressedRatingMatrix[j][1];
				if (tempItem == tempComparedItem) {
					continue;// Ignore itself.
				} // Of if

				if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
					tempTotal += compressedRatingMatrix[j][2];
					tempNeighbors++;
				} // Of if
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction

首先要明确的是其中的radius,也是自己设置的。因为是要leave one out,抹去其中一个,所以有tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating) / (itemDegrees[tempItem] - 1);用之前算得的总共的平均值乘以数量减去需要抹去的那个,得出的结果再除以数量减一。因为我们算是算一行的所以循环里的条件是(int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++);遇到自己就跳过continue,之后就是确定邻居数量,累计tempTotal,算出均值。

最后贴上运行结果:

 这种和Knn的区别也很明显,对于邻居的把控,Knn是自己控制自己选择,而这则是基于一个辐射半径,在这范围里的都被视作邻居。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值