机器学习——基于M-distance的推荐

颜妮儿

已于 2022-05-07 16:44:23 修改

阅读量752

点赞数 1

分类专栏： Java 机器学习文章标签：机器学习推荐算法算法

于 2022-05-06 23:14:07 首次发布

本文链接：https://blog.csdn.net/Z__XY_/article/details/124617358

版权

Java 同时被 2 个专栏收录

42 篇文章 0 订阅

订阅专栏

机器学习

14 篇文章 1 订阅

订阅专栏

算法出处：导师和师姐发表的一篇论文

场景：
在这里插入图片描述
让我们根据当前的评分表预测 $\color{Red}?$ 的值。
数据描述：
$U=\{u_0,u_1,u_2,u_3,u_4\}$ 表示参与评分的用户数据集；
$M=\{m_0,m_1,m_2,m_3,m_4,m_5\}$ 表示被评阅的电影数据集；
矩阵 $R$ 表示评分矩阵： $R=(r_{i,j})_{n\times m},0\le i\le n-1\ and\ 0\le j\le m-1$
预测过程（基于item-based recommendation）：
1：设置阈值 $\delta$ 用以确认待测值得邻近实例，用 $B(u_i,m_j)$ 表示邻近的 $m$ ，其中 $B(u_i,m_j)=\{k|0\le k \le m-1,j\ne j,|\overline{r}_j-\overline{r}_k|\le \delta,r_{i,k}\ne 0\}$ ；
2：预测值
$p_{i,j}= \left\{\begin{matrix} &\frac{\sum\limits_{k\in{B(u_i,m_j)}}r_{i,k}}{|B(u_i,m_j)|},\quad|B(u_i,m_j)| > 0\\ \\ &\overline{r}_j,\quad otherwise \end{matrix}\right.$
其中 $\overline{r}_j$ 表示默认值。
数据声明：

	/**
	 * Default rating for 1-5 points.
	 */
	public static final double DEFAULT_RATING = 3.0;

	/**
	 * The total number of users.
	 */
	private int numUsers;

	/**
	 * The total number of items.
	 */
	private int numItems;

	/**
	 * The total number of ratings (non-zero values)
	 */
	private int numRatings;

	/**
	 * The predictions.
	 */
	private double[] predictions;

	/**
	 * Compressed rating matrix. User-item-rating triples.
	 */
	public int[][] compressedRatingMatrix;

	/**
	 * The degree of users (how many item he has rated).
	 */
	private int[] userDegrees;

	/**
	 * The average rating of the current user.
	 */
	private double[] userAverageRatings;

	/**
	 * The degree of item.(how many users has rated the item).
	 */
	private int[] itemDegrees;

	/**
	 * The average rating of the current item.
	 */
	private double[] itemAverageRatings;

	/**
	 * The first user start form 0. Let the first user has x ratings,the second user
	 * will start form x.
	 */
	private int[] userStartingIndices;

	/**
	 * Number of non-neighbor objects.
	 */
	private int numNonNeighbors;

	/**
	 * The radius (delta) for determining the neighborhood.
	 */
	private double radius;

数据的读入和处理：
在这里插入图片描述

	/**
	 *********************
	 * Construct the rating matrix.
	 * 
	 * @param paraFilename   The rating filename.
	 * @param paraNumbers    The number of users.
	 * @param paraNumItems   The number of items.
	 * @param paraNumRatings The number of ratings.
	 *********************
	 */
	public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
		// Step 1. Initialize these arrays.
		numItems = paraNumItems;
		numUsers = paraNumUsers;
		numRatings = paraNumRatings;

		userDegrees = new int[numUsers];
		userStartingIndices = new int[numUsers + 1];
		userAverageRatings = new double[numUsers];
		itemDegrees = new int[numItems];
		compressedRatingMatrix = new int[numRatings][3];
		itemAverageRatings = new double[numItems];

		predictions = new double[numRatings];

		System.out.println("Rating " + paraFilename);

		// Step 2. Read the data file.
		File tempFile = new File(paraFilename);
		if (!tempFile.exists()) {
			System.out.println("File " + paraFilename + " does not exists.");
			System.exit(0);
		} // Of if
		BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
		String tempString;
		String[] tempStrArray;
		int tempIndex = 0;
		userStartingIndices[0] = 0;
		userStartingIndices[numUsers] = numRatings;
		while ((tempString = tempBufReader.readLine()) != null) {
			// Each line has three values
			tempStrArray = tempString.split(",");
			compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
			compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
			compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);

			userDegrees[compressedRatingMatrix[tempIndex][0]]++;
			itemDegrees[compressedRatingMatrix[tempIndex][1]]++;

			if (tempIndex > 0) {
				// Starting to read the data of a new user.
				if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
					userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
				} // Of if
			} // Of if
			tempIndex++;
		} // Of while
		tempBufReader.close();

		double[] tempUserTotalScore = new double[numUsers];
		double[] tempItemTotalScore = new double[numItems];
		for (int i = 0; i < numRatings; i++) {
			tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
			tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];

		} // Of for i

		for (int i = 0; i < numUsers; i++) {
			userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
		} // Of fir i

		for (int i = 0; i < numItems; i++) {
			itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
		} // Of fir i
	}// Of the constructor

设置 $\delta$ 值得方法：

	/**
	 ********************
	 * Set the radius(delta).
	 *
	 * @param paraRadius The given radius.
	 *********************
	 */
	public void setRadius(double paraRadius) {
		if (paraRadius > 0) {
			radius = paraRadius;
		} else {
			radius = 0.1;
		} // Of if
	}// Of setRadius

基于item-based recommendation测试图示：
在这里插入图片描述

这里的测试方法是每次在原有数据集中“扣”出一个数据用来预测，剩下的数据作为训练集。

	/**
	 ********************
	 * Leave-one-out prediction. The predicted values are stores in predictions.
	 *
	 * @see predictions
	 *********************
	 */
	public void leaveOneOutPrediction() {
		double tempItemAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
					/ (itemDegrees[tempItem] - 1);
			// Step 2. Recompute neighbors,at the same time obtain the ratings of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;
			for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
				tempComparedItem = compressedRatingMatrix[j][1];
				if (tempItem == tempComparedItem) {
					continue;// Ignore itself.
				} // Of if

				if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
					tempTotal += compressedRatingMatrix[j][2];
					tempNeighbors++;
				} // Of if
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction

算法性能评价的两种方式：设：预测实例个数为 $k$ ,数组 $p$ []保存预测值，则：
$\begin{matrix} &MAE=\frac{\sum_i^{k-1}|p_{i}-r_{i,2}|}{k}\\ &RSME=\begin{pmatrix}\frac{\sum_i^{k-1}|p_{i}-r_{i,2}|^2}{k}\end{pmatrix}^\frac{1}{2} \end{matrix}$
MAE能很好地反映预测值误差的实际情况；
RSME可以用来很亮预测值与真实值之间的偏差。

	/**
	 ********************
	 * Compute the MAE based on the deviation of each leave-one-out.
	 *********************
	 */
	public double computeMAE() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		return tempTotalError / predictions.length;
	}// Of computeMAE

	/**
	 ********************
	 * Compute the RSME based on the deviation of each leave-one-out.
	 *********************
	 */
	public double computeRSME() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
					* (predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		double tempAverage = tempTotalError / predictions.length;

		return Math.sqrt(tempAverage);
	}// Of computeRSME

主函数：

/**
	 *********************
	 * The entrance of the program.
	 * 
	 * @param args Not used now.
	 *********************
	 */
	public static void main(String[] args) {
		try {
			MBR tempRecommender = new MBR("F:/sampledata-main/movielens-943u1682m.txt", 943, 1682, 100000);

			for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
				tempRecommender.setRadius(tempRadius);

				tempRecommender.leaveOneOutPrediction();
				double tempMAE = tempRecommender.computeMAE();
				double tempRSME = tempRecommender.computeRSME();

				System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME = " + tempRSME
						+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
			} // Of for tempRadius
		} catch (Exception ee) {
			System.out.println(ee);
		} // Of try
	}// Of main

运行结果：
在这里插入图片描述
补充：user-based recommendation

不然发现，就是将原有矩阵转置后进行同样操作，但由于矩阵采用的是压缩矩阵，我直接便利了所有数据，不出意外的很慢很慢~

	/**
	 ********************
	 * Leave-one-out prediction. The predicted values are stores in predictions.
	 *
	 * @see predictions
	 *********************
	 */
	public void leaveOneOutPrediction2() {
		double tempUserAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction2 for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempUserAverageRating = (userAverageRatings[tempUser] * userDegrees[tempUser] - tempRating)
					/ (userDegrees[tempUser] - 1);
			// Step 2. Recompute neighbors,at the same time obtain the ratings of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;

			for (int j = 0; j < numUsers; j++) {
				if (j == tempUser)
					continue;
				for (int k = userStartingIndices[j]; k < userStartingIndices[j + 1]; k++) {
					tempComparedItem = compressedRatingMatrix[k][1];
					if (tempComparedItem == tempItem
							&& Math.abs(tempUserAverageRating - userAverageRatings[j]) < radius) {
						tempTotal += compressedRatingMatrix[k][2];
						tempNeighbors++;
					} // Of if
				} // Of for k
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction2

运行结果：
在这里插入图片描述
补充：
上面咱们说到了user-based recommendation就是用同样的方式处理转置矩阵，下面补充了压缩矩阵的转置代码：

	/**
	 ********************
	 * Transform the compressed matrix.
	 *********************
	 */
	public void transformMatrix() {
		int[][] resultMatrix = new int[numRatings][3];
		int[] tempItemCounts = new int[numItems];
		int[] tempPointIndex = new int[numItems];
		int[] tempItemStartingIndices = new int[numItems + 1];

		// Count the number of every item.
		for (int i = 0; i < numRatings; i++) {
			tempItemCounts[compressedRatingMatrix[i][1]]++;
		} // Of for i

		// Get every item's starting index and initial the point array.
		tempPointIndex[0] = 0;
		tempItemStartingIndices[0] = 0;
		tempItemStartingIndices[numItems] = numRatings;
		for (int i = 1; i < numItems; i++) {
			tempPointIndex[i] = tempItemCounts[i - 1] + tempPointIndex[i - 1];
			tempItemStartingIndices[i] = tempPointIndex[i];
		} // Of for i

		// Transform the matrix.
		int tempIndex;
		for (int i = 0; i < numRatings; i++) {
			tempIndex = tempPointIndex[compressedRatingMatrix[i][1]];
			resultMatrix[tempIndex][0] = compressedRatingMatrix[i][1];
			resultMatrix[tempIndex][1] = compressedRatingMatrix[i][0];
			resultMatrix[tempIndex][2] = compressedRatingMatrix[i][2];
			tempPointIndex[compressedRatingMatrix[i][1]]++;
		} // Of for i

		// Swap the value between users and items.
		int tempArray[], tempValue;
		double tempAvarageArray[];
		compressedRatingMatrix = resultMatrix;
		userStartingIndices = tempItemStartingIndices;

		tempArray = userDegrees;
		userDegrees = itemDegrees;
		itemDegrees = tempArray;

		tempValue = numUsers;
		numUsers = numItems;
		numItems = tempValue;

		tempAvarageArray = userAverageRatings;
		userAverageRatings = itemAverageRatings;
		itemAverageRatings = tempAvarageArray;
		// leaveOneOutPrediction();
	}// Of transformMatrix

这是打印的转置压缩矩阵一个测试用例，可以帮助理解：
在这里插入图片描述
测试数据：

0,0,2
0,3,5
0,4,1
1,0,1
1,2,3
2,1,4
2,3,3

在主函数中调用：

	System.out.println("\r\n-------user-based recommendation by transform -------");
			tempRecommender.transformMatrix();

			for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
				tempRecommender.setRadius(tempRadius);

				tempRecommender.leaveOneOutPrediction();
				double tempMAE = tempRecommender.computeMAE();
				double tempRSME = tempRecommender.computeRSME();

				System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME=" + tempRSME
						+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
			} // Of for tempRadius

运行结果和上图一样，但更快：
在这里插入图片描述

颜妮儿

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
机器学习——基于M-distance的推荐

算法出处：导师和师姐发表的一篇论文场景：让我们根据当前的评分表预测?\color{Red}??的值。数据描述：U={u0,u1,u2,u3,u4}U=\{u_0,u_1,u_2,u_3,u_4\}U={u0,u1,u2,u3,u4}表示参与评分的用户数据集；M={m0,m1,m2,m3,m4,m5}M=\{m_0,m_1,m_2,m_3,m_4,m_5\}M={m0,m1,m2,m3,m4,m5}表示被评阅的电影数据集；矩阵RRR表示评分矩阵：R=(ri,j)n×m,0≤
复制链接

扫一扫