java发展邻域_mahout推荐11-探究用户邻域

1、固定大小的用户邻域

package mahout;

import java.io.File;

import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;

import org.apache.mahout.cf.taste.eval.RecommenderBuilder;

import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;

import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;

import org.apache.mahout.cf.taste.impl.eval.LoadEvaluator;

import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;

import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;

import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

import org.apache.mahout.cf.taste.model.DataModel;

import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;

import org.apache.mahout.cf.taste.recommender.Recommender;

import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import org.apache.mahout.cf.taste.similarity.precompute.example.GroupLensDataModel;

public class GroupLensDataModelTest {

public static void main(String[] args) throws Exception {

//数据集

DataModel dataModel = new GroupLensDataModel(new File("data/ratings.dat"));

//基于平均值的评估器

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

//推荐引擎构造器

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

public Recommender buildRecommender(DataModel dataModel) throws TasteException {

// TODO Auto-generated method stub

//用户相似度

UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);

//用户邻居,固定值100

UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(100, userSimilarity, dataModel);

//基于用户的推荐

return new GenericUserBasedRecommender(dataModel, userNeighborhood, userSimilarity);

}

};

//评估得分

double score = evaluator.evaluate(recommenderBuilder, null, dataModel, 0.95, 0.05);

System.out.println(score);

}

private static void recommend() throws IOException, TasteException {

//使用定制的GrouplensDataModel,如果没有转换数据集成为csv格式的

DataModel dataModel = new GroupLensDataModel(new File(

"data/ratings.dat"));

//皮尔逊相关系数,衡量用户相似度

UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(

dataModel);

//构建用户邻居,100个

UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(100,

userSimilarity, dataModel);

//推荐引擎

Recommender recommender = new GenericUserBasedRecommender(dataModel,

userNeighborhood, userSimilarity);

//运行

LoadEvaluator.runLoad(recommender);

}

}

这里采用的用户邻域为100,同时evaluate的最后一个参数是0.05,意味着仅使用5%的数据进行评估,0.95就是说使用95%的数据来构建要评估的模型,剩下的5%用来做测试。

输出结果:0.8316465777957014

14/08/05 10:14:20 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt

14/08/05 10:14:20 INFO file.FileDataModel: Reading file info...

14/08/05 10:14:21 INFO file.FileDataModel: Processed 1000000 lines

14/08/05 10:14:23 INFO file.FileDataModel: Processed 2000000 lines

14/08/05 10:14:24 INFO file.FileDataModel: Processed 3000000 lines

14/08/05 10:14:25 INFO file.FileDataModel: Processed 4000000 lines

14/08/05 10:14:27 INFO file.FileDataModel: Processed 5000000 lines

14/08/05 10:14:28 INFO file.FileDataModel: Processed 6000000 lines

14/08/05 10:14:29 INFO file.FileDataModel: Processed 7000000 lines

14/08/05 10:14:30 INFO file.FileDataModel: Processed 8000000 lines

14/08/05 10:14:31 INFO file.FileDataModel: Processed 9000000 lines

14/08/05 10:14:34 INFO file.FileDataModel: Processed 10000000 lines

14/08/05 10:14:34 INFO file.FileDataModel: Read lines: 10000054

14/08/05 10:14:34 INFO model.GenericDataModel: Processed 10000 users

14/08/05 10:14:35 INFO model.GenericDataModel: Processed 20000 users

14/08/05 10:14:35 INFO model.GenericDataModel: Processed 30000 users

14/08/05 10:14:35 INFO model.GenericDataModel: Processed 40000 users

14/08/05 10:14:36 INFO model.GenericDataModel: Processed 50000 users

14/08/05 10:14:39 INFO model.GenericDataModel: Processed 60000 users

14/08/05 10:14:40 INFO model.GenericDataModel: Processed 69878 users

14/08/05 10:14:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel

14/08/05 10:14:41 INFO model.GenericDataModel: Processed 3410 users

14/08/05 10:14:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 3120 users

14/08/05 10:14:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 3120 tasks in 4 threads

14/08/05 10:14:42 INFO eval.StatsCallable: Average time per recommendation: 46ms

14/08/05 10:14:42 INFO eval.StatsCallable: Approximate memory used: 311MB / 755MB

14/08/05 10:14:42 INFO eval.StatsCallable: Unable to recommend in 0 cases

14/08/05 10:15:04 INFO eval.StatsCallable: Average time per recommendation: 86ms

14/08/05 10:15:04 INFO eval.StatsCallable: Approximate memory used: 279MB / 639MB

14/08/05 10:15:04 INFO eval.StatsCallable: Unable to recommend in 4540 cases

14/08/05 10:15:25 INFO eval.StatsCallable: Average time per recommendation: 87ms

14/08/05 10:15:25 INFO eval.StatsCallable: Approximate memory used: 303MB / 641MB

14/08/05 10:15:25 INFO eval.StatsCallable: Unable to recommend in 9001 cases

14/08/05 10:15:46 INFO eval.StatsCallable: Average time per recommendation: 86ms

14/08/05 10:15:46 INFO eval.StatsCallable: Approximate memory used: 332MB / 641MB

14/08/05 10:15:46 INFO eval.StatsCallable: Unable to recommend in 13655 cases

14/08/05 10:15:49 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.8316465777957014

0.8316465777957014

将固定大小的邻域使用10代替100,见userNeighborhood,评估结果为:0.8582162139625891 变大了,评估值越大越不好,意味着选错方向了。

使用500,结果为:0.7558864506703784 这个结果自然比较好。

在真实数据上做一下实验对推荐程序的调优是很有必要的。

2、基于阈值的邻域

类似以我为中心,画个圆,里面的都是我的邻居哎。

new ThresholdUserNeighborhood(0.7,similarity,model);

评分为:0.7949910282938843

14/08/05 10:27:18 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt

14/08/05 10:27:19 INFO file.FileDataModel: Reading file info...

14/08/05 10:27:20 INFO file.FileDataModel: Processed 1000000 lines

14/08/05 10:27:21 INFO file.FileDataModel: Processed 2000000 lines

14/08/05 10:27:22 INFO file.FileDataModel: Processed 3000000 lines

14/08/05 10:27:23 INFO file.FileDataModel: Processed 4000000 lines

14/08/05 10:27:24 INFO file.FileDataModel: Processed 5000000 lines

14/08/05 10:27:26 INFO file.FileDataModel: Processed 6000000 lines

14/08/05 10:27:27 INFO file.FileDataModel: Processed 7000000 lines

14/08/05 10:27:27 INFO file.FileDataModel: Processed 8000000 lines

14/08/05 10:27:28 INFO file.FileDataModel: Processed 9000000 lines

14/08/05 10:27:29 INFO file.FileDataModel: Processed 10000000 lines

14/08/05 10:27:29 INFO file.FileDataModel: Read lines: 10000054

14/08/05 10:27:32 INFO model.GenericDataModel: Processed 10000 users

14/08/05 10:27:33 INFO model.GenericDataModel: Processed 20000 users

14/08/05 10:27:33 INFO model.GenericDataModel: Processed 30000 users

14/08/05 10:27:34 INFO model.GenericDataModel: Processed 40000 users

14/08/05 10:27:34 INFO model.GenericDataModel: Processed 50000 users

14/08/05 10:27:34 INFO model.GenericDataModel: Processed 60000 users

14/08/05 10:27:35 INFO model.GenericDataModel: Processed 69878 users

14/08/05 10:27:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel

14/08/05 10:27:39 INFO model.GenericDataModel: Processed 3530 users

14/08/05 10:27:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 3234 users

14/08/05 10:27:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 3234 tasks in 4 threads

14/08/05 10:27:40 INFO eval.StatsCallable: Average time per recommendation: 94ms

14/08/05 10:27:40 INFO eval.StatsCallable: Approximate memory used: 625MB / 855MB

14/08/05 10:27:40 INFO eval.StatsCallable: Unable to recommend in 134 cases

14/08/05 10:28:03 INFO eval.StatsCallable: Average time per recommendation: 93ms

14/08/05 10:28:03 INFO eval.StatsCallable: Approximate memory used: 374MB / 781MB

14/08/05 10:28:03 INFO eval.StatsCallable: Unable to recommend in 3816 cases

14/08/05 10:28:28 INFO eval.StatsCallable: Average time per recommendation: 96ms

14/08/05 10:28:28 INFO eval.StatsCallable: Approximate memory used: 336MB / 781MB

14/08/05 10:28:28 INFO eval.StatsCallable: Unable to recommend in 7943 cases

14/08/05 10:28:51 INFO eval.StatsCallable: Average time per recommendation: 94ms

14/08/05 10:28:51 INFO eval.StatsCallable: Approximate memory used: 368MB / 664MB

14/08/05 10:28:51 INFO eval.StatsCallable: Unable to recommend in 11574 cases

14/08/05 10:28:57 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7949910282938843

0.7949910282938843

使用0.9:结果为:0.8474542269736061 为了得到这个值,机子的cpu满负荷。

使用0.5呢:结果为 0.7409341920894663

数值越小,精度越好。

-- 运行这个,小心你的机子。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值