Mahout: distributed item-based algorithm 3

Running recommendations with Hadoop

The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:




Run the example with wikipida data

bin/hadoop fs -put links-simple-sorted.txt input/input.txt

bin/hadoop fs -put users.txt input/users.txt
In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.
hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

 

NOTE

The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.

 

OR 

Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.

 

------------------

How to alter the codes shows as below:

org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob

 

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, ItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 =====>

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, WikipediaItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 --

 

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   ToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   ToUserVectorsReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);
 =====>

 

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   WikipediaToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   WikipediaToUserVectorReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

 

 

 

 

Run samples on hadoop

Env: mahout 0.9   hadoop2.3.0

mvn  clean package -Dhadoop2.version=2.3.0 -DskipTests

mvn  clean package -Dhadoop.version=2.3.0 -DskipTests

 

 

 

 

 

 References

https://www.ibm.com/developerworks/library/j-mahout/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值