Mahout: distributed item-based algorithm 3

最新推荐文章于 2024-07-29 09:10:44 发布

ylzhjlinux

最新推荐文章于 2024-07-29 09:10:44 发布

阅读量108

点赞数

分类专栏： Mahout 文章标签：大数据

本文链接：https://blog.csdn.net/ylzhjlinux/article/details/84596239

版权

Mahout 专栏收录该内容

25 篇文章 0 订阅

订阅专栏

Running recommendations with Hadoop

The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:

Run the example with wikipida data

bin/hadoop fs -put links-simple-sorted.txt input/input.txt

bin/hadoop fs -put users.txt input/users.txt

In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.

hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

NOTE

The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.

Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.

------------------

How to alter the codes shows as below:

org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, ItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

=====>

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, WikipediaItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   ToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   ToUserVectorsReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

=====>

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   WikipediaToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   WikipediaToUserVectorReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

Run samples on hadoop

Env: mahout 0.9 hadoop2.3.0

mvn  clean package -Dhadoop2.version=2.3.0 -DskipTests

mvn  clean package -Dhadoop.version=2.3.0 -DskipTests

References

https://www.ibm.com/developerworks/library/j-mahout/

ylzhjlinux

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Mahout: distributed item-based algorithm 3

Running recommendations with HadoopThe glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the se...
复制链接

扫一扫

专栏目录