机器学习——基于M-distance的推荐

算法出处:导师和师姐发表的一篇论文

场景:
在这里插入图片描述
让我们根据当前的评分表预测 ? \color{Red}? ?的值。
数据描述:
U = { u 0 , u 1 , u 2 , u 3 , u 4 } U=\{u_0,u_1,u_2,u_3,u_4\} U={u0,u1,u2,u3,u4}表示参与评分的用户数据集;
M = { m 0 , m 1 , m 2 , m 3 , m 4 , m 5 } M=\{m_0,m_1,m_2,m_3,m_4,m_5\} M={m0,m1,m2,m3,m4,m5}表示被评阅的电影数据集;
矩阵 R R R表示评分矩阵: R = ( r i , j ) n × m , 0 ≤ i ≤ n − 1   a n d   0 ≤ j ≤ m − 1 R=(r_{i,j})_{n\times m},0\le i\le n-1\ and\ 0\le j\le m-1 R=(ri,j)n×m,0in1 and 0jm1
预测过程(基于item-based recommendation):
1:设置阈值 δ \delta δ用以确认待测值得邻近实例,用 B ( u i , m j ) B(u_i,m_j) B(ui,mj)表示邻近的 m m m,其中 B ( u i , m j ) = { k ∣ 0 ≤ k ≤ m − 1 , j ≠ j , ∣ r ‾ j − r ‾ k ∣ ≤ δ , r i , k ≠ 0 } B(u_i,m_j)=\{k|0\le k \le m-1,j\ne j,|\overline{r}_j-\overline{r}_k|\le \delta,r_{i,k}\ne 0\} B(ui,mj)={k0km1,j=j,rjrkδ,ri,k=0}
2: 预测值
p i , j = { ∑ k ∈ B ( u i , m j ) r i , k ∣ B ( u i , m j ) ∣ , ∣ B ( u i , m j ) ∣ > 0 r ‾ j , o t h e r w i s e p_{i,j}= \left\{\begin{matrix} &\frac{\sum\limits_{k\in{B(u_i,m_j)}}r_{i,k}}{|B(u_i,m_j)|},\quad|B(u_i,m_j)| > 0\\ \\ &\overline{r}_j,\quad otherwise \end{matrix}\right. pi,j=B(ui,mj)kB(ui,mj)ri,k,B(ui,mj)>0rj,otherwise
其中 r ‾ j \overline{r}_j rj表示默认值。
数据声明:

	/**
	 * Default rating for 1-5 points.
	 */
	public static final double DEFAULT_RATING = 3.0;

	/**
	 * The total number of users.
	 */
	private int numUsers;

	/**
	 * The total number of items.
	 */
	private int numItems;

	/**
	 * The total number of ratings (non-zero values)
	 */
	private int numRatings;

	/**
	 * The predictions.
	 */
	private double[] predictions;

	/**
	 * Compressed rating matrix. User-item-rating triples.
	 */
	public int[][] compressedRatingMatrix;

	/**
	 * The degree of users (how many item he has rated).
	 */
	private int[] userDegrees;

	/**
	 * The average rating of the current user.
	 */
	private double[] userAverageRatings;

	/**
	 * The degree of item.(how many users has rated the item).
	 */
	private int[] itemDegrees;

	/**
	 * The average rating of the current item.
	 */
	private double[] itemAverageRatings;

	/**
	 * The first user start form 0. Let the first user has x ratings,the second user
	 * will start form x.
	 */
	private int[] userStartingIndices;

	/**
	 * Number of non-neighbor objects.
	 */
	private int numNonNeighbors;

	/**
	 * The radius (delta) for determining the neighborhood.
	 */
	private double radius;

数据的读入和处理:
在这里插入图片描述

	/**
	 *********************
	 * Construct the rating matrix.
	 * 
	 * @param paraFilename   The rating filename.
	 * @param paraNumbers    The number of users.
	 * @param paraNumItems   The number of items.
	 * @param paraNumRatings The number of ratings.
	 *********************
	 */
	public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
		// Step 1. Initialize these arrays.
		numItems = paraNumItems;
		numUsers = paraNumUsers;
		numRatings = paraNumRatings;

		userDegrees = new int[numUsers];
		userStartingIndices = new int[numUsers + 1];
		userAverageRatings = new double[numUsers];
		itemDegrees = new int[numItems];
		compressedRatingMatrix = new int[numRatings][3];
		itemAverageRatings = new double[numItems];

		predictions = new double[numRatings];

		System.out.println("Rating " + paraFilename);

		// Step 2. Read the data file.
		File tempFile = new File(paraFilename);
		if (!tempFile.exists()) {
			System.out.println("File " + paraFilename + " does not exists.");
			System.exit(0);
		} // Of if
		BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
		String tempString;
		String[] tempStrArray;
		int tempIndex = 0;
		userStartingIndices[0] = 0;
		userStartingIndices[numUsers] = numRatings;
		while ((tempString = tempBufReader.readLine()) != null) {
			// Each line has three values
			tempStrArray = tempString.split(",");
			compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
			compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
			compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);

			userDegrees[compressedRatingMatrix[tempIndex][0]]++;
			itemDegrees[compressedRatingMatrix[tempIndex][1]]++;

			if (tempIndex > 0) {
				// Starting to read the data of a new user.
				if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
					userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
				} // Of if
			} // Of if
			tempIndex++;
		} // Of while
		tempBufReader.close();

		double[] tempUserTotalScore = new double[numUsers];
		double[] tempItemTotalScore = new double[numItems];
		for (int i = 0; i < numRatings; i++) {
			tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
			tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];

		} // Of for i

		for (int i = 0; i < numUsers; i++) {
			userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
		} // Of fir i

		for (int i = 0; i < numItems; i++) {
			itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
		} // Of fir i
	}// Of the constructor

设置 δ \delta δ值得方法:

	/**
	 ********************
	 * Set the radius(delta).
	 *
	 * @param paraRadius The given radius.
	 *********************
	 */
	public void setRadius(double paraRadius) {
		if (paraRadius > 0) {
			radius = paraRadius;
		} else {
			radius = 0.1;
		} // Of if
	}// Of setRadius

基于item-based recommendation测试图示:
在这里插入图片描述
在这里插入图片描述

这里的测试方法是每次在原有数据集中“扣”出一个数据用来预测,剩下的数据作为训练集。

	/**
	 ********************
	 * Leave-one-out prediction. The predicted values are stores in predictions.
	 *
	 * @see predictions
	 *********************
	 */
	public void leaveOneOutPrediction() {
		double tempItemAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
					/ (itemDegrees[tempItem] - 1);
			// Step 2. Recompute neighbors,at the same time obtain the ratings of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;
			for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
				tempComparedItem = compressedRatingMatrix[j][1];
				if (tempItem == tempComparedItem) {
					continue;// Ignore itself.
				} // Of if

				if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
					tempTotal += compressedRatingMatrix[j][2];
					tempNeighbors++;
				} // Of if
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction

算法性能评价的两种方式:设:预测实例个数为 k k k,数组 p p p[]保存预测值,则:
M A E = ∑ i k − 1 ∣ p i − r i , 2 ∣ k R S M E = ( ∑ i k − 1 ∣ p i − r i , 2 ∣ 2 k ) 1 2 \begin{matrix} &MAE=\frac{\sum_i^{k-1}|p_{i}-r_{i,2}|}{k}\\ &RSME=\begin{pmatrix}\frac{\sum_i^{k-1}|p_{i}-r_{i,2}|^2}{k}\end{pmatrix}^\frac{1}{2} \end{matrix} MAE=kik1piri,2RSME=(kik1piri,22)21
MAE能很好地反映预测值误差的实际情况;
RSME可以用来很亮预测值与真实值之间的偏差。

	/**
	 ********************
	 * Compute the MAE based on the deviation of each leave-one-out.
	 *********************
	 */
	public double computeMAE() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		return tempTotalError / predictions.length;
	}// Of computeMAE

	/**
	 ********************
	 * Compute the RSME based on the deviation of each leave-one-out.
	 *********************
	 */
	public double computeRSME() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
					* (predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		double tempAverage = tempTotalError / predictions.length;

		return Math.sqrt(tempAverage);
	}// Of computeRSME

主函数:

/**
	 *********************
	 * The entrance of the program.
	 * 
	 * @param args Not used now.
	 *********************
	 */
	public static void main(String[] args) {
		try {
			MBR tempRecommender = new MBR("F:/sampledata-main/movielens-943u1682m.txt", 943, 1682, 100000);

			for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
				tempRecommender.setRadius(tempRadius);

				tempRecommender.leaveOneOutPrediction();
				double tempMAE = tempRecommender.computeMAE();
				double tempRSME = tempRecommender.computeRSME();

				System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME = " + tempRSME
						+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
			} // Of for tempRadius
		} catch (Exception ee) {
			System.out.println(ee);
		} // Of try
	}// Of main

运行结果:
在这里插入图片描述
补充:user-based recommendation
在这里插入图片描述
在这里插入图片描述
不然发现,就是将原有矩阵转置后进行同样操作,但由于矩阵采用的是压缩矩阵,我直接便利了所有数据,不出意外的很慢很慢~

	/**
	 ********************
	 * Leave-one-out prediction. The predicted values are stores in predictions.
	 *
	 * @see predictions
	 *********************
	 */
	public void leaveOneOutPrediction2() {
		double tempUserAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction2 for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempUserAverageRating = (userAverageRatings[tempUser] * userDegrees[tempUser] - tempRating)
					/ (userDegrees[tempUser] - 1);
			// Step 2. Recompute neighbors,at the same time obtain the ratings of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;

			for (int j = 0; j < numUsers; j++) {
				if (j == tempUser)
					continue;
				for (int k = userStartingIndices[j]; k < userStartingIndices[j + 1]; k++) {
					tempComparedItem = compressedRatingMatrix[k][1];
					if (tempComparedItem == tempItem
							&& Math.abs(tempUserAverageRating - userAverageRatings[j]) < radius) {
						tempTotal += compressedRatingMatrix[k][2];
						tempNeighbors++;
					} // Of if
				} // Of for k
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction2

运行结果:
在这里插入图片描述
补充:
上面咱们说到了user-based recommendation就是用同样的方式处理转置矩阵,下面补充了压缩矩阵的转置代码:

	/**
	 ********************
	 * Transform the compressed matrix.
	 *********************
	 */
	public void transformMatrix() {
		int[][] resultMatrix = new int[numRatings][3];
		int[] tempItemCounts = new int[numItems];
		int[] tempPointIndex = new int[numItems];
		int[] tempItemStartingIndices = new int[numItems + 1];

		// Count the number of every item.
		for (int i = 0; i < numRatings; i++) {
			tempItemCounts[compressedRatingMatrix[i][1]]++;
		} // Of for i

		// Get every item's starting index and initial the point array.
		tempPointIndex[0] = 0;
		tempItemStartingIndices[0] = 0;
		tempItemStartingIndices[numItems] = numRatings;
		for (int i = 1; i < numItems; i++) {
			tempPointIndex[i] = tempItemCounts[i - 1] + tempPointIndex[i - 1];
			tempItemStartingIndices[i] = tempPointIndex[i];
		} // Of for i

		// Transform the matrix.
		int tempIndex;
		for (int i = 0; i < numRatings; i++) {
			tempIndex = tempPointIndex[compressedRatingMatrix[i][1]];
			resultMatrix[tempIndex][0] = compressedRatingMatrix[i][1];
			resultMatrix[tempIndex][1] = compressedRatingMatrix[i][0];
			resultMatrix[tempIndex][2] = compressedRatingMatrix[i][2];
			tempPointIndex[compressedRatingMatrix[i][1]]++;
		} // Of for i

		// Swap the value between users and items.
		int tempArray[], tempValue;
		double tempAvarageArray[];
		compressedRatingMatrix = resultMatrix;
		userStartingIndices = tempItemStartingIndices;

		tempArray = userDegrees;
		userDegrees = itemDegrees;
		itemDegrees = tempArray;

		tempValue = numUsers;
		numUsers = numItems;
		numItems = tempValue;

		tempAvarageArray = userAverageRatings;
		userAverageRatings = itemAverageRatings;
		itemAverageRatings = tempAvarageArray;
		// leaveOneOutPrediction();
	}// Of transformMatrix

这是打印的转置压缩矩阵一个测试用例,可以帮助理解:
在这里插入图片描述
测试数据:

0,0,2
0,3,5
0,4,1
1,0,1
1,2,3
2,1,4
2,3,3

在主函数中调用:

	System.out.println("\r\n-------user-based recommendation by transform -------");
			tempRecommender.transformMatrix();

			for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
				tempRecommender.setRadius(tempRadius);

				tempRecommender.leaveOneOutPrediction();
				double tempMAE = tempRecommender.computeMAE();
				double tempRSME = tempRecommender.computeRSME();

				System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME=" + tempRSME
						+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
			} // Of for tempRadius

运行结果和上图一样,但更快:
在这里插入图片描述

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
众所周知,人工智能是当前最热门的话题之一, 计算机技术与互联网技术的快速发展更是将对人工智能的研究推向一个新的高潮。 人工智能是研究模拟和扩展人类智能的理论与方法及其应用的一门新兴技术科学。 作为人工智能核心研究领域之一的机器学习, 其研究动机是为了使计算机系统具有人的学习能力以实现人工智能。 那么, 什么是机器学习呢? 机器学习 (Machine Learning) 是对研究问题进行模型假设,利用计算机从训练数据中学习得到模型参数,并最终对数据进行预测和分析的一门学科。 机器学习的用途 机器学习是一种通用的数据处理技术,其包含了大量的学习算法。不同的学习算法在不同的行业及应用中能够表现出不同的性能和优势。目前,机器学习已成功地应用于下列领域: 互联网领域----语音识别、搜索引擎、语言翻译、垃圾邮件过滤、自然语言处理等 生物领域----基因序列分析、DNA 序列预测、蛋白质结构预测等 自动化领域----人脸识别、无人驾驶技术、图像处理、信号处理等 金融领域----证券市场分析、信用卡欺诈检测等 医学领域----疾病鉴别/诊断、流行病爆发预测等 刑侦领域----潜在犯罪识别与预测、模拟人工智能侦探等 新闻领域----新闻推荐系统等 游戏领域----游戏战略规划等 从上述所列举的应用可知,机器学习正在成为各行各业都会经常使用到的分析工具,尤其是在各领域数据量爆炸的今天,各行业都希望通过数据处理与分析手段,得到数据中有价值的信息,以便明确客户的需求和指引企业的发展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值