推荐相关

最新推荐文章于 2023-04-17 20:48:32 发布

axman

最新推荐文章于 2023-04-17 20:48:32 发布

阅读量1.3k

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/axman/article/details/37902879

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1.搜集偏好基础数据，主要是利用投票，打分（评价）获取原始数据。

基础数据

	/*
	 * 特征数据索引，为了用数据存储特征数据，直接用下标表示特征数据彩集的对象。如 {2.5,0,3.5,3.0,3.5,4}表示 {
	 * "Lady in the Water":2.5, "Snakes on a Plane":0, "Just My Luck":3.5,
	 * "Superman Retirns":3.0, "The Night Listener":3.5,
	 * "You,Me and Dupree":4.0}
	 */
	public static String[] indexName = { "Lady in the Water",
			"Snakes on a Plane", "Just My Luck", "Superman Retirns",
			"The Night Listener", "You,Me and Dupree" };

	public static Map<String, double[]> data = new HashMap<String, double[]>();

	static {
		/**
		 * transform from python: 
		 * names = critics['Lisa Rose'] 
		 * for person in critics: 
		 * 		print 'data.put(\"%s\",new double[]{' %person, 
		 * 		for item in names.keys(): 
		 * 			if item not in critics[person]: 
		 * 				print '0.0,', 
		 * 			else :
		 * 				print '%s,' %critics[person][item], 
		 * 		print '});'
		 */
		data.put("Jack Matthews",
				new double[] { 3.0, 4.0, 0.0, 5.0, 3.0, 3.5, });
		data.put("Mick LaSalle", new double[] { 3.0, 4.0, 2.0, 3.0, 3.0, 2.0, });
		data.put("Claudia Puig", new double[] { 0.0, 3.5, 3.0, 4.0, 4.5, 2.5, });
		data.put("Lisa Rose", new double[] { 2.5, 3.5, 3.0, 3.5, 3.0, 2.5, });
		data.put("Toby", new double[] { 0.0, 4.5, 0.0, 4.0, 0.0, 1.0, });
		data.put("Gene Seymour", new double[] { 3.0, 3.5, 1.5, 5.0, 3.0, 3.5, });
		data.put("Michael Phillips", new double[] { 2.5, 3.0, 0.0, 3.5, 4.0,
				0.0, });

	}

2.近似数据

欧氏距离：将每一特征差值求平方，然后求总和以后取平方根。（关键点：每一特征向量值度量一致，否则大的波峰将影响结果） sqrt(pow(Fa1-Fa2) + pow(Fb1-Fb2))

值越小则距离越近。为了表示越相近值越在，需要进行倒数处理。1 / 1+sqrt(pow(Fa1-Fa2) + pow(Fb1-Fb2)) 分母加1是为了防止sqrt(pow(Fa1-Fa2) + pow(Fb1-Fb2))为零

皮尔逊相关度：最佳拟合线（解决夸大分值，分值不影响拟合稳定）

1.对所有打分求和 2.求平方和 3.求乘积之和 4计算皮尔逊值

	public static double pearson(Map<String, double[]> data, String p1,
			String p2) {
		double[] d1 = data.get(p1);
		double[] d2 = data.get(p2);
		int[] tows = new int[6];
		int count = 0;
		double sum1 = 0.0, sum2 = 0.0;
		double sum1Sq = 0.0, sum2Sq = 0.0;
		double pSum = 0.0;
		for (int i = 0; i < d1.length; i++) { //
			if (d1[i] != 0.0 && d2[i] != 0.0) {
				tows[i] = 1;
				count++;
				sum1 += d1[i];
				sum2 += d2[i];
				sum1Sq += d1[i] * d1[i];
				sum2Sq += d2[i] * d2[i];
				pSum += d1[i] * d2[i];
			}
		}

		if (count == 0)
			return 1;

		double num = pSum - (sum1 * sum2 / count);
		double den = Math.sqrt((sum1Sq - Math.pow(sum1, 2) / count)
				* (sum2Sq - Math.pow(sum2, 2) / count));
		if (den == 0.0)
			return 0;
		return num / den;
	}

3.推荐

从最邻点中找出感兴趣的特征会遇到一个问题：最邻点没有这个特征数据（和你品味最相似的人没有对你感兴趣的电影进行评价），或者最邻点特征数据异常（和你品味最相似的人对某类电影打分非常古怪），所以要参考其它点的特征数据经过加权运算来作为某特征的数据（根据其它评评论者的评价相似度<权值>和打分<特征值>和积作为评价值），然后进行排序

	public static double[] getRecommendation(Map<String, double[]> data,
			String person) {
		double[] dp = data.get(person);
		double[] totals = new double[dp.length];
		double[] simSums = new double[dp.length];
		for (String other : data.keySet()) {
			if (other.equals(person))
				continue;
			double sim = pearson(data, person, other);
			if (sim <= 0)
				continue;
			double[] dor = data.get(other);
			for (int i = 0; i < dor.length; i++) {
				if (dor[i] != 0 && dp[i] == 0) {
					totals[i] += dor[i] * sim;
					simSums[i] += sim;
				}
			}

		}

		double[] rankings = new double[dp.length];
		for (int i = 0; i < dp.length; i++) {
			if (simSums[i] == 0)
				rankings[i] = 0;
			else
				rankings[i] = totals[i] / simSums[i];
		} 
		return rankings; //这里没有排序是为了把index和上面的indexName对应，如果要排序的话需要和indexName绑定后一起排序。
	}

调用：

	public static void main(String[] args) {
		double[] ds = getRecommendation(data, "Toby");
		for (int i = 0; i < 6; i++)
			if (ds[i] != 0)
				System.out.println(indexName[i] + ": " + ds[i]);

	}

相似商品：

把上面的问题换成某商品有哪些人喜欢，喜欢这些商品的人喜欢哪些其它物品来决定相似度。即对象是物，对象上的特征换成人，算法一致。