mahout itemCF 简单使用

一、itemCF 测试

mahout版本 0.10.0

mahout 提供了很多的算法,其中比较常用的算是itemCF了这里记录一下itemcf的使用方法

1、数据准备,这里是使用自己采集的一些行为数据 ,数据不多,但是可以测试出结果:

下面三列分别是 user_id , item_id , perfence   

把以下数据存放到hdfs上,我存放的路径是/mahout/itemcf/data1/itemdata.data

 
 
  1. 0162381440670851711,4,7.0
  2. 0162381440670851711,11,4.0
  3. 0162381440670851711,32,1.0
  4. 0162381440670851711,176,27.0
  5. 0162381440670851711,183,11.0
  6. 0162381440670851711,184,5.0
  7. 0162381440670851711,207,9.0
  8. 0162381440670851711,256,3.0
  9. 0162381440670851711,258,4.0
  10. 0162381440670851711,259,16.0
  11. 0162381440670851711,260,8.0
  12. 0162381440670851711,261,18.0
  13. 0162381440670851711,301,1.0
  14. 0162381440670851711,307,1.0
  15. 0162381440670851711,477,1.0
  16. 0162381440670851711,518,1.0
  17. 0162381440670851711,549,3.0
  18. 0162381440670851711,570,1.0
  19. 0162381440670851711,826,2.0
  20. 0357211441096952115,207,1.0
  21. 0617721441096186493,184,1.0
  22. 0617721441096186493,207,1.0
  23. 1205421441071459451,5,1.0
  24. 1214361441096861254,207,1.0
  25. 1401731441095483081,258,1.0
  26. 1401731441095483081,814,4.0
  27. 1401731441095483081,826,1.0
  28. 1917281441163686119,259,10.0
  29. 1917281441163686119,260,1.0
  30. 1917281441163686119,261,3.0
  31. 1966141441163860798,176,1.0
  32. 2294491441095342047,176,1.0
  33. 2441031440670827430,4,13.0
  34. 2441031440670827430,259,29.0
  35. 2441031440670827430,261,14.0
  36. 2441031440670827430,460,2.0
  37. 2441031440670827430,477,6.0
  38. 2441031440670827430,570,1.0
  39. 2441031440670827430,577,6.0
  40. 2441031440670827430,702,1.0
  41. 2441031440670827430,758,2.0
  42. 2441031440670827430,809,1.0
  43. 2475791441161318569,176,1.0
  44. 2987091441068878630,261,1.0
  45. 3114261440726814722,549,1.0
  46. 3445831441096810087,207,1.0
  47. 3846061441096937902,207,1.0
  48. 4266911441160164599,176,1.0
  49. 4698311441097046150,176,2.0
  50. 4698311441097046150,183,2.0
  51. 4698311441097046150,184,4.0
  52. 4698311441097046150,207,6.0
  53. 4946291441097563245,183,1.0
  54. 4956331440750398178,159,1.0
  55. 4956331440750398178,160,1.0
  56. 5307571441160362208,4,1.0
  57. 5307571441160362208,176,1.0
  58. 5719691441098504387,176,5.0
  59. 5719691441098504387,184,1.0
  60. 5719691441098504387,207,1.0
  61. 5813281441095425044,184,2.0
  62. 5813281441095425044,258,1.0
  63. 5894601441095265604,184,1.0
  64. 5981521441096106535,207,1.0
  65. 6292291441096870187,207,1.0
  66. 6533651441161410910,176,1.0
  67. 6810691441096902907,207,1.0
  68. 6836071440729632252,4,3.0
  69. 6836071440729632252,49,1.0
  70. 6836071440729632252,259,2.0
  71. 6836071440729632252,570,1.0
  72. 6836071440729632252,577,2.0
  73. 6964141441160527746,176,1.0
  74. 7495291441096796843,207,1.0
  75. 7616681441095305067,183,1.0
  76. 7616681441095305067,184,2.0
  77. 7616681441095305067,258,2.0
  78. 7616681441095305067,261,1.0
  79. 7732211441095211112,183,1.0
  80. 7732211441095211112,259,2.0
  81. 7732211441095211112,260,9.0
  82. 7732211441095211112,261,1.0
  83. 7732211441095211112,632,6.0
  84. 8211761441096060717,176,1.0
  85. 8211761441096060717,183,1.0
  86. 8305691441168039389,259,3.0
  87. 8305691441168039389,260,2.0
  88. 8305691441168039389,261,1.0
  89. 8375281440837772178,527,1.0
  90. 8432311440724457499,290,1.0
  91. 8641451441097297246,183,1.0
  92. 8641451441097297246,184,1.0
  93. 8641451441097297246,207,1.0
  94. 8641451441097297246,259,1.0
  95. 8641451441097297246,263,1.0
  96. 8641451441097297246,838,1.0
  97. 8641451441097297246,839,1.0
  98. 8641451441097297246,840,1.0
  99. 8651081441095283643,176,2.0
  100. 8651081441095283643,183,7.0
  101. 8753221441095342356,176,1.0

2、使用mahout自带的算法 实现协同过滤:

语句如下:

 
 
  1. bin/hadoop jar /home/lin/hadoop/mahout-distribution-0.10.0/mahout-examples-0.10.0-job.jar  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob  -i /mahout/itemcf/data1   -o  /mahout/itemcf/result1 -s SIMILARITY_LOGLIKELIHOOD --tempDir /mahout/itemcf/temp1

其中 -i 后面是输入数据存放地址,也就是上面给的测试数据;

        -o 后面是结果输出地址,这个文件夹不用建立,mahout会自动建立,若是已经存在则会报错

      --tempDir 是指临时存放的一些输出数据,mahout自己的一些输出 ,这个路径mahout自动创建,若是存在会报错

        -s  是指定使用算法;可以根据自己的需要选择;

具体的help如下

  
  
  1. Job-Specific Options:
  2. --input (-i) input Path to job input
  3. directory.
  4. --output (-o) output The directory
  5. pathname for output.
  6. --similarityClassname (-s) similarityClassname Name of distributed
  7. similarity measures
  8. class to instantiate,
  9. alternatively use one
  10. of the predefined
  11. similarities
  12. ([SIMILARITY_COOCCURRE
  13. NCE,
  14. SIMILARITY_LOGLIKELIHO
  15. OD,
  16. SIMILARITY_TANIMOTO_CO
  17. EFFICIENT,
  18. SIMILARITY_CITY_BLOCK,
  19. SIMILARITY_COSINE,
  20. SIMILARITY_PEARSON_COR
  21. RELATION,
  22. SIMILARITY_EUCLIDEAN_D
  23. ISTANCE])
  24. --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem try to cap the number
  25. of similar items per
  26. item to this number
  27. (default: 100)
  28. --maxPrefs (-mppu) maxPrefs max number of
  29. preferences to
  30. consider per user or
  31. item, users or items
  32. with more preferences
  33. will be sampled down
  34. (default: 500)
  35. --minPrefsPerUser (-mp) minPrefsPerUser ignore users with
  36. less preferences than
  37. this (default: 1)
  38. --booleanData (-b) booleanData Treat input as
  39. without pref values
  40. --threshold (-tr) threshold discard item pairs
  41. with a similarity
  42. value below this
  43. --randomSeed randomSeed use this seed for
  44. sampling
  45. --help (-h) Print out help
  46. --tempDir tempDir Intermediate output
  47. directory
  48. --startPhase startPhase First phase to run
  49. --endPhase endPhase Last phase to run

3、执行上述命令后,等待执行完毕,在目录 /mahout/itemcf/result1 可以看到如下数据:

  
  
  1. 162381440670851711 [809:13.535571,702:13.535571,460:13.535571,758:13.535571,632:13.182321,577:12.929438,49:11.368558,307:10.562227,32:10.562227,518:10.562227]
  2. 617721441096186493 [839:1.0,259:1.0,518:1.0,826:1.0,11:1.0,260:1.0,4:1.0,32:1.0,176:1.0,840:1.0]
  3. 1401731441095483081 [11:1.0,570:1.0,518:1.0,307:1.0,260:1.0,259:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
  4. 1917281441163686119 [577:7.365086,702:6.5,809:6.5,758:6.5,460:6.5,184:5.9840446,176:5.981493,4:5.577299,570:5.3220325,477:4.9567957]
  5. 2441031440670827430 [632:21.5,176:18.084661,183:15.684914,260:14.2175,207:13.510652,11:12.28147,307:12.28147,32:12.28147,518:12.28147,256:12.28147]
  6. 4698311441097046150 [263:3.9337947,839:3.9337947,840:3.9337947,838:3.9337947,11:3.4747553,307:3.4747553,32:3.4747553,518:3.4747553,256:3.4747553,301:3.4747553]
  7. 5307571441160362208 [826:1.0,259:1.0,518:1.0,307:1.0,11:1.0,260:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
  8. 5719691441098504387 [4:3.6454906,259:3.6147578,260:2.67091,261:2.6694102,183:2.517088,307:2.2876854,11:2.2876854,32:2.2876854,518:2.2876854,256:2.2876854]
  9. 5813281441095425044 [207:1.8607497,259:1.6642486,183:1.5539461,301:1.4806436,11:1.4806436,307:1.4806436,32:1.4806436,518:1.4806436,256:1.4806436,549:1.4099455]
  10. 6836071440729632252 [207:2.6088793,176:2.3617313,477:1.9966183,460:1.9945599,758:1.9945599,809:1.9945599,702:1.9945599,11:1.9926376,307:1.9926376,32:1.9926376]
  11. 7616681441095305067 [826:1.5790755,207:1.5721571,549:1.535743,301:1.50748,307:1.50748,11:1.50748,32:1.50748,518:1.50748,256:1.50748,839:1.5]
  12. 7732211441095211112 [826:3.7059078,549:3.7059078,307:3.3461132,256:3.3461132,518:3.3461132,11:3.3461132,301:3.3461132,32:3.3461132,570:3.1800203,477:3.1795032]
  13. 8211761441096060717 [826:1.0,259:1.0,518:1.0,307:1.0,11:1.0,260:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
  14. 8305691441168039389 [577:2.2471673,4:2.083036,570:2.0549815,809:2.0,460:2.0,11:2.0,826:2.0,32:2.0,307:2.0,549:2.0]
  15. 8641451441097297246 [11:1.0,632:1.0,518:1.0,826:1.0,260:1.0,570:1.0,549:1.0,32:1.0,307:1.0,477:1.0]
  16. 8651081441095283643 [184:6.597979,258:6.1955295,260:6.1955295,826:5.5266876,549:5.5266876,477:5.5266876,259:4.662548,261:4.662548,11:4.626224,307:4.626224]
这样就得出了每个用户的推荐物品;

mahout 还有一个经常用到的算法 物品相似度 ,这样得到的结果是物品间的相度:

 
 
  1. mahout itemsimilarity  -i /mahout/itemcf/data1   -o  /mahout/itemcf/result1 -s SIMILARITY_LOGLIKELIHOOD --tempDir /mahout/itemcf/temp1
 
 
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值