hadoop2.2+mahout0.9实战

版本:hadoop2.2.0,mahout0.9。

使用mahout的org.apache.mahout.cf.taste.hadoop.item.RecommenderJob进行测试。

首先说明下,如果使用官网提供的下载hadoop2.2.0以及mahout0.9进行调用mahout的相关算法会报错。一般报错如下:

 
 
  1. java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
  2. at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
  3. at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
  4. at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:73)
  5. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

这个是因为目前mahout只支持hadoop1 的缘故。在这里可以找到解决方法:https://issues.apache.org/jira/browse/MAHOUT-1329。主要就是修改pom文件,修改mahout的依赖。

大家可以下载修改后的源码包(http://download.csdn.net/detail/fansy1990/7165957)自己编译mahout(mvn clean install -Dhadoop2 -Dhadoop.2.version=2.2.0 -DskipTests),或者直接下载已经编译好的jar包(http://download.csdn.net/detail/fansy1990/7166017、http://download.csdn.net/detail/fansy1990/7166055)。

接着,按照这篇文章建立eclipse的环境:http://blog.csdn.net/fansy1990/article/details/22896249。环境配置好了之后,需要添加mahout的jar包,下载前面提供的jar包,然后导入到java工程中。

编写下面的java代码:

 
 
  1. package fz.hadoop2.util;
  2.  
  3. import org.apache.hadoop.conf.Configuration;
  4. import org.apache.hadoop.yarn.conf.YarnConfiguration;
  5.  
  6. public class Hadoop2Util {
  7. private static Configuration conf=null;
  8. private static final String YARN_RESOURCE="node31:8032";
  9. private static final String DEFAULT_FS="hdfs://node31:9000";
  10. public static Configuration getConf(){
  11. if(conf==null){
  12. conf = new YarnConfiguration();
  13. conf.set("fs.defaultFS", DEFAULT_FS);
  14. conf.set("mapreduce.framework.name", "yarn");
  15. conf.set("yarn.resourcemanager.address", YARN_RESOURCE);
  16. }
  17. return conf;
  18. }
  19. }

 
 
  1. package fz.mahout.recommendations;
  2.  
  3. import org.apache.hadoop.conf.Configuration;
  4. import org.apache.hadoop.util.ToolRunner;
  5. import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;
  6. import org.junit.After;
  7. import org.junit.Before;
  8. import org.junit.Test;
  9.  
  10. import fz.hadoop2.util.Hadoop2Util;
  11. /**
  12. * 测试mahout org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
  13. * environment:
  14. * mahout0.9
  15. * hadoop2.2
  16. * @author fansy
  17. *
  18. */
  19. public class RecommenderJobTest{
  20. //RecommenderJob rec = null;
  21. Configuration conf =null;
  22. @Before
  23. public void setUp(){
  24. // rec= new RecommenderJob();
  25. conf= Hadoop2Util.getConf();
  26. System.out.println("Begin to test...");
  27. }
  28. @Test
  29. public void testMain() throws Exception{
  30. String[] args ={
  31. "-i","hdfs://node31:9000/input/user.csv",
  32. "-o","hdfs://node31:9000/output/rec001",
  33. "-n","3","-b","false","-s","SIMILARITY_EUCLIDEAN_DISTANCE",
  34. "--maxPrefsPerUser","7","--minPrefsPerUser","2",
  35. "--maxPrefsInItemSimilarity","7",
  36. "--outputPathForSimilarityMatrix","hdfs://node31:9000/output/matrix/rec001",
  37. "--tempDir","hdfs://node31:9000/output/temp/rec001"};
  38. ToolRunner.run(conf, new RecommenderJob(), args);
  39. }
  40. @After
  41. public void cleanUp(){
  42. }
  43. }

在前面下载好了mahout的jar包后,需要把这些jar包放入hadoop2的lib目录(share/hadoop/mapreduce/lib,注意不一定一定要这个路径,其他hadoop lib也可以)。然后运行RecommenderJobTest即可。

输入文件如下:

 
 
  1. 1,101,5.0
  2. 1,102,3.0
  3. 1,103,2.5
  4. 2,101,2.0
  5. 2,102,2.5
  6. 2,103,5.0
  7. 2,104,2.0
  8. 3,101,2.5
  9. 3,104,4.0
  10. 3,105,4.5
  11. 3,107,5.0
  12. 4,101,5.0
  13. 4,103,3.0
  14. 4,104,4.5
  15. 4,106,4.0
  16. 5,101,4.0
  17. 5,102,3.0
  18. 5,103,2.0
  19. 5,104,4.0
  20. 5,105,3.5
  21. 5,106,4.0

输出文件为:

最后一个MR日志:

 
 
  1. 2014-04-09 13:03:09,301 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
  2. 2014-04-09 13:03:09,301 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.map.child.java.opts is deprecated. Instead, use mapreduce.map.java.opts
  3. 2014-04-09 13:03:09,302 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
  4. 2014-04-09 13:03:09,302 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
  5. 2014-04-09 13:03:09,317 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(56)) - Connecting to ResourceManager at node31/192.168.0.31:8032
  6. 2014-04-09 13:03:09,460 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
  7. 2014-04-09 13:03:09,515 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(394)) - number of splits:1
  8. 2014-04-09 13:03:09,531 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - fs.default.name is deprecated. Instead, use fs.defaultFS
  9. 2014-04-09 13:03:09,547 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(477)) - Submitting tokens for job: job_1396479318893_0015
  10. 2014-04-09 13:03:09,602 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(174)) - Submitted application application_1396479318893_0015 to ResourceManager at node31/192.168.0.31:8032
  11. 2014-04-09 13:03:09,604 INFO [main] mapreduce.Job (Job.java:submit(1272)) - The url to track the job: http://node31:8088/proxy/application_1396479318893_0015/
  12. 2014-04-09 13:03:09,604 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1317)) - Running job: job_1396479318893_0015
  13. 2014-04-09 13:03:24,170 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1338)) - Job job_1396479318893_0015 running in uber mode : false
  14. 2014-04-09 13:03:24,170 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 0% reduce 0%
  15. 2014-04-09 13:03:32,299 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 100% reduce 0%
  16. 2014-04-09 13:03:41,373 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 100% reduce 100%
  17. 2014-04-09 13:03:42,404 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356)) - Job job_1396479318893_0015 completed successfully
  18. 2014-04-09 13:03:42,485 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1363)) - Counters: 43
  19. File System Counters
  20. FILE: Number of bytes read=306
  21. FILE: Number of bytes written=163713
  22. FILE: Number of read operations=0
  23. FILE: Number of large read operations=0
  24. FILE: Number of write operations=0
  25. HDFS: Number of bytes read=890
  26. HDFS: Number of bytes written=192
  27. HDFS: Number of read operations=10
  28. HDFS: Number of large read operations=0
  29. HDFS: Number of write operations=2
  30. Job Counters
  31. Launched map tasks=1
  32. Launched reduce tasks=1
  33. Data-local map tasks=1
  34. Total time spent by all maps in occupied slots (ms)=5798
  35. Total time spent by all reduces in occupied slots (ms)=6179
  36. Map-Reduce Framework
  37. Map input records=7
  38. Map output records=21
  39. Map output bytes=927
  40. Map output materialized bytes=298
  41. Input split bytes=131
  42. Combine input records=0
  43. Combine output records=0
  44. Reduce input groups=5
  45. Reduce shuffle bytes=298
  46. Reduce input records=21
  47. Reduce output records=5
  48. Spilled Records=42
  49. Shuffled Maps =1
  50. Failed Shuffles=0
  51. Merged Map outputs=1
  52. GC time elapsed (ms)=112
  53. CPU time spent (ms)=1560
  54. Physical memory (bytes) snapshot=346509312
  55. Virtual memory (bytes) snapshot=1685782528
  56. Total committed heap usage (bytes)=152834048
  57. Shuffle Errors
  58. BAD_ID=0
  59. CONNECTION=0
  60. IO_ERROR=0
  61. WRONG_LENGTH=0
  62. WRONG_MAP=0
  63. WRONG_REDUCE=0
  64. File Input Format Counters
  65. Bytes Read=572
  66. File Output Format Counters
  67. Bytes Written=192

说明:由于只测试了一个协同过滤算法的程序,其他的算法并没有测试,如果其他算法在此版本上有问题,也是可能有的。

分享,成长,快乐

转载请注明blog地址:http://blog.csdn.net/fansy1990

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值