机器学习之基于M-distance 的推荐：日撸Java三百行day54-55

最新推荐文章于 2022-05-06 23:14:07 发布

陈序袁

最新推荐文章于 2022-05-06 23:14:07 发布

阅读量245

点赞数

分类专栏：机器学习文章标签：算法

本文链接：https://blog.csdn.net/weixin_49592304/article/details/124589994

版权

机器学习专栏收录该内容

26 篇文章 2 订阅

订阅专栏

一、什么是M-distance

M-distance是MBR(Memory-Based Recommenders)System中使用的一种评分预测机制。该算法来源于论文 Mei Zheng, Fan Min, Heng-Ru Zhang, Wen-Bin Chen, Fast recommendations with the M-distance, IEEE Access 4 (2016) 1464–1468。下载地址在此

二、M-distance的基本思想

在M-distance算法中同样也引入了“邻居”的概念。令项目 $j$ 的平均分为 $r_{\cdot j}$ ，那么本算法第 $j$ 个项目关于第 $i$ 个用户的“邻居”的数学表达式为：

$N_{i,j}=\left \{ 1\leq {j}'\leq m|{j}'\neq j,p_{i{j}'}\neq 0|\left | \overline{r_{\cdot j}}-\overline{r_{\cdot {j}'}} \right | < \epsilon \right \}$

与KNN算法不同，该算法中的邻居不是由K控制，而是通过计算不同数据之间的差值来寻找邻居。即：距离小于radius（ $\epsilon$ ）的都为邻居。radius一般人为设定。第 $i$ 个用户对第 $j$ 个项目的评分预测为：

$p_{ij}=\frac{{\textstyle \sum_{{j}'\in N_{ij} }^{}} r_{i{j}' } }{\left | N_{ij} \right |}$

如图所示，要预测用户 $u_{0}$ 的 $m_{2}$ 数据，就需要找到其相应的邻居。利用与 $m_{2}$ 数据相似数据进行推荐，得到一个预测值。首先利用相似度计算找到邻居，而相似度的计算，则是通过从 $m_{0}$ 到 $m_{5}$ 每一个数据的平均值。找到与 $m_{2}$ 的平均值相差小于radius的数据作为其邻居。如：找到 $m_{1}$ 与 $m_{3}$ 作为其邻居。将 $m_{1}$ 与 $m_{3}$ 的值相加并求均值，就得到了 $m_{2}$ 的预测值。

三、什么是leave-one-out

leave-one-out(LOO)，亦称为留一法，是机器学习领域的一种交叉验证的方法。顾名思义，可以理解为：只留下一个。在这种验证方法中，我们可以将数据集分为10份，使用其中的9份进行训练，而将留下的那一份作为测试集。该过程可以重复10次，每次使用的测试数据都不同。这使得每次的测试和验证都可能会有不同的测试结果。

留一法的优点在于十分的公平，能够将算法运用于每一个数据上进行测试，并得到相应的结果。同时这种方法也具有确定性，有着明确的方向。

但是，留一法的缺点也显而易见。机器学习的数据量十分庞大，若在一个不是那么高效的算法中使用留一法，那么算法的时间开销很大。因此，留一法对算法效率有着较高的要求。一般在高效算法中使用这种方法，本文的M-distance算法可以使用。

四、算法的基本流程及操作

在介绍算法流程之前，先分析数据集。本次学习使用了电影评分表数据集。数据量为100000，下载地址在此。数据集共三种属性：user、movie、score。如：[0,64,4]表示用户0观看了64号电影，给出的评价为4分。

User	Movie	Score
0	64	4
0	65	4
...	...	...
942	1329	3

以电影评分表数据集处理为例，阐述M-distance算法的基本流程。

①初始化全局变量，并读取数据文件，将数据文件中的内容填充到全局变量或矩阵中。

//初始化全局变量
	public static final double DEFAULT_RATING=3.0;
	private int numUsers;//用户数量
	private int numItems;//项目数量
	private int numRating;//评分数量
	private double[] predictions;//预测
	private int[][] compressedRatingMatrix;//压缩的评分矩阵
	private double[] userAverageRatings;//用户平均评分
	private int[] userDegrees;//当前用户数量
	private int[] itemDegrees;//项目评分
	private double[] itemAverageRatings;//项目平均评分
	private int[] userStartingIndices;//开始下标
	private int numNonNeighbors;//不存在邻居的数量
	private double radius;//判断是否为邻居的关键变量，为人工设定
	public MBR(String paraFilename,int paraNumUsers,int paraNumItems,int paraNumRatings)throws Exception{
		//第一步 初始化
		numItems=paraNumItems;
		numUsers=paraNumUsers;
		numRating=paraNumRatings;
		
		userDegrees=new int[numUsers];
		userStartingIndices=new int[numUsers+1];
		userAverageRatings=new double[numUsers];
		itemDegrees=new int[numItems];
		compressedRatingMatrix=new int[numRating][3];
		itemAverageRatings=new double[numRating];
		
		System.out.println("Reading " + paraFilename);
		
		//第二步 读取数据文件
		File tempFile=new File(paraFilename);
		if(!tempFile.exists()) {//不存在
			System.out.println("File " + paraFilename + "does not exist.");
			System.exit(0);
		}//of if
		BufferedReader tempBufReader=new BufferedReader(new FileReader(tempFile));
		String tempString;
		String[] tempStrArray;
		int tempIndex=0;
		userStartingIndices[0]=0;
		userStartingIndices[numUsers]=numRating;
		while((tempString=tempBufReader.readLine())!=null) {//按行读取，行某行为空时停止
			//每一行有三个值
			tempStrArray=tempString.split(",");//用，分开
			compressedRatingMatrix[tempIndex][0]=Integer.parseInt(tempStrArray[0]);//读第一行
			compressedRatingMatrix[tempIndex][1]=Integer.parseInt(tempStrArray[1]);//读第二行
			compressedRatingMatrix[tempIndex][2]=Integer.parseInt(tempStrArray[2]);//读第三行
			
			userDegrees[compressedRatingMatrix[tempIndex][0]]++;//用户数目加一
			itemDegrees[compressedRatingMatrix[tempIndex][1]]++;//项目数目加一
			
			if(tempIndex>0) {//选择下一行数据
				if(compressedRatingMatrix[tempIndex][0]!=compressedRatingMatrix[tempIndex-1][0]) {
					userStartingIndices[compressedRatingMatrix[tempIndex][0]]=tempIndex;
				}//of if
			}//of if
			tempIndex++;
		}//of while
		tempBufReader.close();
		
		double[] tempUserTotalScore=new double[numUsers];//所有用户总分矩阵
		double[] tempItemTotalScore=new double[numItems];//所有项目总分矩阵
		for(int i=0;i<numRating;i++) {
			tempUserTotalScore[compressedRatingMatrix[i][0]]+=compressedRatingMatrix[i][2];//用户的分数为第一列的数值与第三列的数值
			tempItemTotalScore[compressedRatingMatrix[i][1]]+=compressedRatingMatrix[i][2];
		}//of for i
		for(int i=0;i<numUsers;i++) {
			userAverageRatings[i]=tempUserTotalScore[i]/userDegrees[i];//用户的平均评分等于用户总评分除以用户数量
		}
		for(int i=0;i<numItems;i++) {
			itemAverageRatings[i]=tempItemTotalScore[i]/itemDegrees[i];//项目平均评分等于项目总评分除以项目数量
		}
	}//of MBR

②设置radius，用于后续找邻居操作。

	public void setRadius(double paraRadius) {
		if(paraRadius>0) {
			radius=paraRadius;
		}
		else {
			radius=0.1;
		}
	}//of setRadius

③利用留一法（leave-one-out）进行预测，存储预测值。

	public void leaveOneOutPrediction() {//遮住一个数值，来进行预测
		double tempItemAverageRating;//评估均值
		int tempUser,tempItem,tempRating;
		System.out.println("\r\nleaveOneOutPrediction for radius " + radius);
		numNonNeighbors=0;//非邻居个数
		for(int i=0;i<numRating;i++) {
			tempUser=compressedRatingMatrix[i][0];//第一列为用户
			tempItem=compressedRatingMatrix[i][1];//第二列为项目
			tempRating=compressedRatingMatrix[i][2];//第三列为估分
			
			//第一步  重新计算当前项目的评估均值，即遮住这个数值，计算该行除了这个数值以外的总分数与平均分
			tempItemAverageRating=(itemAverageRatings[tempItem]*itemDegrees[tempItem]-tempRating)/(itemDegrees[tempItem]-1);
		    //第二步 重新计算邻居，同时获得评估值，即两者均值之差
			int  tempNeighbors=0;
			double tempTotal=0;
			int tempComparedItem;
			for(int j=userStartingIndices[tempUser];j<userStartingIndices[tempUser+1];j++) {//只与周围的进行比较
				tempComparedItem=compressedRatingMatrix[j][1];//比较
				if(tempItem==tempComparedItem) {//若相同，则无视
					continue;
				}
				if(Math.abs(tempItemAverageRating-itemAverageRatings[tempComparedItem])<radius) {//若两个间距小于预设的radius值
					tempTotal+=compressedRatingMatrix[j][2];//纳入
					tempNeighbors++;//邻居加一
				}
			}//of for j
			//第三步 利用邻居的值进行预测遮住位置的值
			if(tempNeighbors>0) {//若存在邻居
				predictions[i]=tempTotal/tempNeighbors;//预测值为总分数除以邻居数目
			}
			else {
				predictions[i]=DEFAULT_RATING;//预测值为默认值
				numNonNeighbors++;
			}//of if
		}//of for i
	}

④计算MAE 与RSME这两个评估指标来反映出算法预测的准确度。

public double computeMAE()throws Exception{//MAE代表着绝对误差
		double tempTotalError=0;
		for(int i=0;i<predictions.length;i++) {
			tempTotalError+=Math.abs(predictions[i]-compressedRatingMatrix[i][2]);//预测值与评估值之间的差值的绝对值就是误差
		}
		return tempTotalError/predictions.length;//总误差之和除以预测个数则为误差比率
	}//computeMAE
	public double computeRSME()throws Exception{
		double tempTotalError=0;
		for(int i=0;i<predictions.length;i++) {
			tempTotalError+=(predictions[i]-compressedRatingMatrix[i][2])*(predictions[i]-compressedRatingMatrix[i][2]);//与MAE不同，这里取了平方
		}
		double tempAverage=tempTotalError/predictions.length;
		return Math.sqrt(tempAverage);//开方后输出
	}//of computeRSME

⑤将radius从0.2到0.6依次递增，按tempRadius值寻找邻居并预测。最后输出在不同radius的情况下算法的预测准确度。

	public static void main(String args[]) {
		try {
			MBR tempRecommender=new MBR("D:/software/eclipse/eclipse-workspace/day51/movielens-943u1682m.txt",943,1682,100000);//读数据
			for(double tempRadius=0.2;tempRadius<0.6;tempRadius+=0.1) {
				tempRecommender.setRadius(tempRadius);//设置
				tempRecommender.leaveOneOutPrediction();//预测
				double tempMAE=tempRecommender.computeMAE();//计算MAE
				double tempRSME=tempRecommender.computeRSME();
				System.out.println("Radius = " + tempRadius + " , MAE = " + tempMAE + " , RSME = " + tempRSME );
			}
		}catch (Exception ee) {
			System.out.println(ee);
			// TODO: handle exception
		}
	}

陈序袁

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
机器学习之基于M-distance 的推荐：日撸Java三百行day54-55

一、什么是M-distanceM-distance是MBR(Memory-Based Recommenders)System中使用的一种评分预测机制。该算法来源于论文 Mei Zheng, Fan Min, Heng-Ru Zhang, Wen-Bin Chen, Fast recommendations with the M-distance, IEEE Access 4 (2016) 1464–1468。下载地址在此二、M-distance的基本思想在M-distance算法中同样也引入
复制链接

扫一扫