大数据推荐系统(5)Mahout

大数据推荐系统算法(1)大数据框架介绍
大数据推荐系统算法(2) lambda架构
大数据推荐系统算法(3) 用户画像
大数据推荐系统(4)推荐算法
大数据推荐系统(5)Mahout
大数据推荐系统(6)Spark
大数据推荐系统(7)推荐系统与Lambda架构
大数据推荐系统(8)分布式数据收集和存储
大数据推荐系统(9)实战
开发环境:
Linux + Intellij IDEA(IDE) +SBT(Simple Build Tool)(项目管理工具) 和 Maven + 持续集成:Jenkins(Jenkins是基于Java开发的一种持续集成工具,用于监控持续重复的工作)

在这里插入图片描述
Spark 基于内存,图调度,算子简单。 scala
H2O 预测分析的平台
Flink 做流处理的平台 (也可做批处理)

Mahout架构:high-level
在这里插入图片描述

Mahout架构:low-level
在这里插入图片描述

Mahout 推荐系统
(1)Mahout实现了协同过滤框架
使用历史数据(打分,点击,购买等)作为推荐的依据
User-based: 通过发现类似的用户推荐商品。由于用户多变的特性,这种方法很那扩展;
Item-based:通过计算item之间相似度推荐商品。商品不易变化,相似度矩阵可离线计算得到。(诞生于Amazon)
 MF-based:通过将原始的user-item矩阵分解成小的矩阵,分析潜在的影响因子,并以解释用户的行为。(诞生于Netflix Prize)

(2)Mahout实现了协同过滤框架
SVD(Singular Value Decomposition)因式分解实现协同过滤
基于ALS(alternating least squares)的协同过滤算法 (NMF)

Mahout推荐系统架构
在这里插入图片描述
输入输出
输入:原始数据(user preferences,用户偏好)
输出:用户偏好估计
步骤
Step 1:将原始数据映射到Mahout定义的Data Model中 (U I P )
Step 2: 调优推荐组件
相似度组件,临界关系组件等
Step 3: 计算排名估计值
Step 4:评估推荐结果

Mahout推荐系统组件
Mahout关键抽象是通过Java Interface实现的
DataModel Interface
将原始数据映射成Mahout兼容格式
UserSimilarity Interface
计算两个用户间的相关度
ItemSimilarity Interface
计算两个商品间的相关度
UserNeighborhood Interface
定义用户或商品间的“临近”
Recommender Interface
实现具体的推荐算法,完成推荐功能(包括训练,预测等)

(1)DataModel
在这里插入图片描述

在这里插入图片描述
不管是什么数据源,他们共享同样的底层实现
基本对象:Preference
三元组(user, item, score)
存储在UserPreferenceArray中
在这里插入图片描述

(2)UserSimilarity
UserSimilarity定义了两个用户的相似度
类似的,ItemSimilarity定义了两个商品间的相似度
相似度实现
Pearson Correlation
Spearman Correlation
Euclidean Distance
Tanimoto Coefficient
LogLikelihood Similarity

(3)UserNeighborhood
在这里插入图片描述

推荐系统评估
在这里插入图片描述
第一种Prediction-based measures
在这里插入图片描述
第二种 IR-based measures
在这里插入图片描述
实例1:preferences
要求
创建user-item偏好数据,并输出
实现
使用GenericUserPreferenceArray创建数据
通过PreferenceArray存储数据

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class CreatePreferenceArray {
    private CreatePreferenceArray() {
    }

    public static void main(String[] args) {
        PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
        User1Pref.setUserID(0, 1L);
        User1Pref.setItemID(0, 101L);
        User1Pref.setValue(0, 3.0f);
        User1Pref.setItemID(1, 102L);
        User1Pref.setValue(1, 4.0f);
        Preference pref = User1Pref.get(1);
        System.out.println(User1Pref);
    }
}

实例2:data model
PreferenceArray存储了单个用户的偏好
所有用户的偏好数据如何保存?
HashMap? NO!
Mahout引入了一个为推荐任务优化的数据结构
FastByIDMap
需求
使用GenericDataModel读入FastByIDMap数据

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class CreateGenericDataModel {
    private CreateGenericDataModel() {
    }

    public static void main(String[] args) {
        FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();
        PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
        User1Pref.setUserID(0, 1L);
        User1Pref.setItemID(0, 101L);
        User1Pref.setValue(0, 3.0f);
        User1Pref.setItemID(1, 102L);
        User1Pref.setValue(1, 4.0f);

        PreferenceArray User2Pref = new GenericUserPreferenceArray(2);
        User2Pref.setUserID(0, 2L);
        User2Pref.setItemID(0, 101L);
        User2Pref.setValue(0, 3.0f);
        User2Pref.setItemID(1, 102L);
        User2Pref.setValue(1, 4.0f);

        preferences.put(1L, User1Pref);
        preferences.put(2L, User2Pref);

        DataModel model = new GenericDataModel(preferences);
        System.out.println(model);
    }
}

实例3:Recommender
需求
通过User-based协同过滤推荐算法给用户1推荐2个商品
实现
使用FileDataModel读入文件
通过PearsonCorrelationSimilarity来计算相似度
使用GenericUserBasedRecommender构建推荐引

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.similarity.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;

import java.io.File;
import java.util.List;

public class RecommenderIntro {
    private RecommenderIntro() {
    }

    public static void main(String[] args) throws Exception{
        DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        List<RecommendedItem> recommendedItems = recommender.recommend(1, 20);

        for (RecommendedItem recommendedItem: recommendedItems){
            System.out.println(recommendedItem);
        }
    }
}

实例4 推荐模型评估(1)
需求
评估实例3的推荐系统的优劣
实现
使用AverageAbsoluteDifferenceRecommenderEvaluator和RMSRecommenderEvaluator来评估模型
通过RecommenderBuilder来实现评估模型

package com.dylan.example;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class EvaluatorIntro {
    private EvaluatorIntro() {
    }

    public static void main(String[] args) throws Exception {

        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderEvaluator recommenderEvaluator = new RMSRecommenderEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
        };

        double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
        double rmse = recommenderEvaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

        System.out.println(score);
        System.out.println(rmse);
    }
}

推荐模型评估(2)
需求
通过IR指标来评估实例3的推荐系统的优劣
实现
使用RecommenderIRStatsEvaluator来进行评估

package com.dylan.example;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.*;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class IREvaluatorIntro {
    private IREvaluatorIntro() {
    }

    public static void main(String[] args) throws Exception {

        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
        };

        IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);

        System.out.println(stats.getPrecision());
        System.out.println(stats.getRecall());
        System.out.println(stats.getF1Measure());
    }
}

实例6:MovieLens推荐系统
需求
使用MovieLens 1M数据集实现电影推荐系统
步骤
实现MovieLens数据集的DataModel
实现Item-based和User-based的协同过滤推荐,并保存结果

1.新构建了 data
2.多线程来进行相似度矩阵的求解,得到similarities.csv的文件
3.对用户进行推荐 得到userRcomed.csv
Recommender cachingRecommender = new CachingRecommender(recommender); 做缓存的作用(数据量大的时候)

(1)把原始数据’::'分割的数据,转变成‘,’的数据

package com.dylan.MovieLens;

import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;

import java.io.*;
import java.util.regex.Pattern;

public class MovieLensDataModel extends FileDataModel {

    private static String COLON_DELIMITER="::";
    private static Pattern COLON_DELIMITER_PATTERN=Pattern.compile(COLON_DELIMITER);

    public MovieLensDataModel(File ratingsFile) throws IOException{
        super(convertFile(ratingsFile));
    }

    private static File convertFile(File orginalFile) throws IOException{
        File resultFile = new File(System.getProperty("java.io.tmpdir"), "ratings.csv");
        if (resultFile.exists()){
            resultFile.delete();
        }
        try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {

            for (String line: new FileLineIterable(orginalFile, false)){
                int lastIndex = line.lastIndexOf(COLON_DELIMITER);

                if (lastIndex < 0 ){
                    throw new IOException("Invalid data!");
                }
                String subLine = line.substring(0, lastIndex);

                String convertedSubLine = COLON_DELIMITER_PATTERN.matcher(subLine).replaceAll(",");
                writer.write(convertedSubLine);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        return resultFile;
    }
}

(2)多线程批量生成结果。 批处理的方式。得到相似度文件。ICF可以用多线程处理(离线处理),UCF不行

package com.dylan.MovieLens;


import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.precompute.FileSimilarItemsWriter;
import org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.ItemBasedRecommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.precompute.BatchItemSimilarities;
import org.apache.mahout.cf.taste.similarity.precompute.SimilarItemsWriter;

import java.io.File;

public class BatchItemSimilaritiesMovieLens {
    private BatchItemSimilaritiesMovieLens(){
    }

    public static void main(String[] args) throws Exception{

        if (args.length !=1){
            System.err.println("Needs MovieLens 1M dataset as arugument!");
            System.exit(-1);
        }

        File resultFile = new File(System.getProperty("java.io.tmpdir"), "similarities.csv");

        DataModel dataModel = new MovieLensDataModel(new File(args[0]));
        ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
        ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);
        BatchItemSimilarities batchItemSimilarities = new MultithreadedBatchItemSimilarities(recommender, 5);

        SimilarItemsWriter writer = new FileSimilarItemsWriter(resultFile);

        int numSimilarites = batchItemSimilarities.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), 1, writer);

        System.out.println("Computed "+ numSimilarites+ " for "+ dataModel.getNumItems()+" items and saved them to "+resultFile.getAbsolutePath());
    }
}

(3)基于user的推荐 得到userRcomed.csv 文件

package com.dylan.MovieLens;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;

public class UserRecommenderMovieLens {
    private UserRecommenderMovieLens(){
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 1) {
            System.err.println("Needs MovieLens 1M dataset as arugument!");
            System.exit(-1);
        }

        File resultFile = new File(System.getProperty("java.io.tmpdir"), "userRcomed.csv");

        DataModel dataModel = new MovieLensDataModel(new File(args[0]));
        UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);

        Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
        Recommender cachingRecommender = new CachingRecommender(recommender);

        //Evaluate
        RMSRecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel dataModel) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);
                return new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
            }
        };
        double score = evaluator.evaluate(recommenderBuilder, null, dataModel, 0.9, 0.5);
        System.out.println("RMSE score is "+score);

        try(PrintWriter writer = new PrintWriter(resultFile)){
            for (int userID=1; userID <= dataModel.getNumUsers(); userID++){
                List<RecommendedItem> recommendedItems = cachingRecommender.recommend(userID, 2);
                String line = userID+" : ";
                for (RecommendedItem recommendedItem: recommendedItems){
                    line += recommendedItem.getItemID()+":"+recommendedItem.getValue()+",";
                }
                if (line.endsWith(",")){
                    line = line.substring(0, line.length()-1);
                }
                writer.write(line);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        System.out.println("Recommended for "+dataModel.getNumUsers()+" users and saved them to "+resultFile.getAbsolutePath());
    }
}

实例7 常用开放数据集:Book-Crossing
1.内容
来自Book-Crossing图书社区,读者对书籍的评分
2.数据量(数据条数)
278858个用户对271379本书进行的评分,包括显式和隐式的评分
3.数据集下载
http://grouplens.org/datasets/book-crossing/

显示数据评分 1-10
在这里插入图片描述
隐式:点击,购买等

需求
使用BookCrossing数据集实现两种图书推荐系统
基于ratings推荐
无ratings推荐 (用布尔变量,点击为1,没点击为0)

步骤
实现BookCrossing数据集的DataModel
实现两套推荐系统
使用GenericBooleanPrefUserBasedRecommender
实现DataModelBuilder

(1)基于ratings推荐
数据处理:

package com.dylan.BookCrossing;

import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;

import java.io.*;
import java.util.regex.Pattern;

public class BXDataModel extends FileDataModel {

    //private static String COLON_DELIMITER="::";
    private static Pattern NON_DIGIT_SEMICOLON_DELIMITER=Pattern.compile("[^0-9;]");

    public BXDataModel(File ratingsFile, Boolean ignoreRatings) throws IOException{
        super(convertFile(ratingsFile, ignoreRatings));
    }

    private static File convertFile(File orginalFile, Boolean ignoreRatings) throws IOException{
        File resultFile = new File(System.getProperty("java.io.tmpdir"), "bookcrossing.csv");
        if (resultFile.exists()){
            resultFile.delete();
        }
        try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {

            for (String line: new FileLineIterable(orginalFile, true)){
                if (line.endsWith("\"0\"")){
                    continue;
                }
                String convertedLine = NON_DIGIT_SEMICOLON_DELIMITER.matcher(line).replaceAll("").replace(';', ',');
                if (convertedLine.contains(",,")){
                    continue;
                }
                if (ignoreRatings){
                    convertedLine = convertedLine.substring(0, convertedLine.lastIndexOf(','));
                }

                writer.write(convertedLine);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        return resultFile;
    }
}

准备一个 Recommender

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.util.Collection;
import java.util.List;

public class BXRecommender implements Recommender{
    private Recommender recommender;
    public BXRecommender(DataModel dataModel) throws TasteException{
        UserSimilarity similarity = new EuclideanDistanceSimilarity(dataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, 0.2, similarity,dataModel, 0.2);
        recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
    }

    public List<RecommendedItem> recommend(long userID, int howMany) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, false);
    }

    public List<RecommendedItem> recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
        return recommender.recommend(userID, howMany, rescorer, false);
    }

    @Override
    public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    @Override
    public float estimatePreference(long userID, long itemID) throws TasteException {
        return recommender.estimatePreference(userID, itemID);
    }

    public void setPreference(long userID, long itemID, float value) throws TasteException {
        recommender.setPreference(userID, itemID, value);
    }

    public void removePreference(long userID, long itemID) throws TasteException {
        recommender.removePreference(userID, itemID);
    }

    public DataModel getDataModel() {
        return recommender.getDataModel();
    }

    @Override
    public void refresh(Collection<Refreshable> collection) {
        recommender.refresh(collection);
    }
}

实现RecommenderBuilder

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;

public class BXBooleanRecommenderBuilder implements RecommenderBuilder {
    @Override
    public Recommender buildRecommender(DataModel dataModel) throws TasteException {
        return new BXBooleanRecommender(dataModel);
    }
}

实现 evaluator

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;

import java.io.File;
import java.io.IOException;

public class BXBooleanRecommenderEvaluator {
    private BXBooleanRecommenderEvaluator(){
    }

    public static void main(String[] args) throws IOException, TasteException {
/*
        DataModel dataModel = new BXDataModel(new File(args[0]), true);
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
        IRStatistics stats = evaluator.evaluate(new BXBooleanRecommenderBuilder(), new BXDataModelBuilder(), dataModel, null, 3, Double.NEGATIVE_INFINITY, 1.0);

        System.out.println("Precision is "+stats.getPrecision()+"; Recall is "+stats.getRecall()+"; F1 is"+stats.getF1Measure());
*/
        DataModel dataModel = new BXDataModel(new File(args[0]), true);
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        double score = evaluator.evaluate(new BXBooleanRecommenderBuilder(), null, dataModel, 0.9, 0.3);

        System.out.println("MAE score is "+score);
    }
}

(2)无rating的数据
准备一个 Recommender

package com.dylan.BookCrossing;

import com.sun.tools.internal.xjc.reader.xmlschema.bindinfo.BIConversion;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.CachingUserSimilarity;

import java.util.Collection;
import java.util.List;

public class BXBooleanRecommender implements Recommender{
    private Recommender recommender;
    public BXBooleanRecommender(DataModel dataModel) throws TasteException{
        UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
        //UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, Double.NEGATIVE_INFINITY, similarity,dataModel, 1.0);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
        recommender = new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, similarity);
    }

    public List<RecommendedItem> recommend(long userID, int howMany) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, false);
    }

    public List<RecommendedItem> recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
        return recommender.recommend(userID, howMany, rescorer, false);
    }

    @Override
    public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    @Override
    public float estimatePreference(long userID, long itemID) throws TasteException {
        return recommender.estimatePreference(userID, itemID);
    }

    public void setPreference(long userID, long itemID, float value) throws TasteException {
        recommender.setPreference(userID, itemID, value);
    }

    public void removePreference(long userID, long itemID) throws TasteException {
        recommender.removePreference(userID, itemID);
    }

    public DataModel getDataModel() {
        return recommender.getDataModel();
    }

    @Override
    public void refresh(Collection<Refreshable> collection) {
        recommender.refresh(collection);
    }
}

实现RecommenderBuilder

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;

public class BXRecommenderBuilder implements RecommenderBuilder {
    @Override
    public Recommender buildRecommender(DataModel dataModel) throws TasteException {
        return new BXRecommender(dataModel);
    }
}

实现 evaluator

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;

import java.io.File;
import java.io.IOException;

public class BXRecommenderEvaluator {
    private BXRecommenderEvaluator(){
    }

    public static void main(String[] args) throws IOException, TasteException {
        DataModel dataModel = new BXDataModel(new File(args[0]), false);
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        double score = evaluator.evaluate(new BXRecommenderBuilder(), null, dataModel, 0.9, 0.3);

        System.out.println("MAE score is "+score);
    }
}

实现DataModelBuilder接口

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class BXDataModelBuilder implements DataModelBuilder{
    @Override
    public DataModel buildDataModel(FastByIDMap<PreferenceArray> fastByIDMap) {
        return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(fastByIDMap));
    }
}

优化 Recommender
1.
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
2.
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
3.
评分标准的改变

Mahout 推荐 推荐
要求:基于MySQL中的电影评分的数据,使用Mahout为每个用户推荐3部电影

  1. 准备 准备 数据库表
    (1)在mahout数据库中创建表:
use mahout;
CREATE TABLE taste_preferences (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
preference FLOAT NOT NULL,
PRIMARY KEY (user_id, item_id),
INDEX (user_id),
INDEX (item_id)
);

并将 ratings.dat 前三列导入taste_preferences 表中。

LOAD DATA LOCAL INFILE
"/home/root/code/MahoutRecommendation/src/main/resources/ratings.
dat" INTO TABLE mahout.taste_preferences FIELDS TERMINATED BY
'::'(user_id,item_id,preference);
  1. 实现 实现 推荐 算法
    使用MySQLJDBCDataModel
package com.dylan.practice;

import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.JDBCDataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;

public class MysqlDataMovieRecommend {
    private MysqlDataMovieRecommend() throws TasteException, IOException {
    }

    public static void main(String[] args) throws TasteException, IOException {
        File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
        //Mysql Connection
        MysqlDataSource mysqlDataSource = new MysqlDataSource();
        mysqlDataSource.setDatabaseName("mahout");
        mysqlDataSource.setServerName("127.0.0.1");
        mysqlDataSource.setUser("mahout");
        mysqlDataSource.setPassword("mahout");
        mysqlDataSource.setAutoReconnect(true);
        mysqlDataSource.setFailOverReadOnly(false);


        JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
        DataModel model = dataModel;

        //Recommendations
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        //UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, model, 1.0);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        try (PrintWriter writer = new PrintWriter(resultFile)) {
            for (int userID = 1; userID <= model.getNumUsers(); userID++) {
                List<RecommendedItem> recommendedItems = recommender.recommend(userID, 3);
                String line = userID + " : ";
                for (RecommendedItem recommendedItem : recommendedItems) {
                    line += recommendedItem.getItemID() + ":" + recommendedItem.getValue() + ",";
                }
                if (line.endsWith(",")) {
                    line = line.substring(0, line.length() - 1);
                }
                writer.write(line);
                writer.write('\n');
            }
        } catch (IOException ioe) {
            resultFile.delete();
            throw ioe;
        }
        System.out.println("Recommended for " + model.getNumUsers() + " users and saved them to " + resultFile.getAbsolutePath());
    }
}

1.movie数据导入MySQL
2.同步MySQL和java IDE

File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
        //Mysql Connection
        MysqlDataSource mysqlDataSource = new MysqlDataSource();
        mysqlDataSource.setDatabaseName("mahout");
        mysqlDataSource.setServerName("127.0.0.1");
        mysqlDataSource.setUser("mahout");
        mysqlDataSource.setPassword("mahout");
        mysqlDataSource.setAutoReconnect(true);
        mysqlDataSource.setFailOverReadOnly(false);

3.生成JDBCDataModel

JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
DataModel model = dataModel;

4.recommender,生成文件

改进:
1.MySQL的配置文件
在这里插入图片描述

2.相似个数10
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);

  • 3
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码 毕业设计 课程设计 项目开发 系统开发 Spark 机器学习 大数据 算法 源码

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值