书面作业
1. 用Maven搭建Mahout的开发环境,并完成PPT 26页,最简单的例子。要求有过程说明和截图。
1.1开发环境
– Win7 64bit
– Java 1.7.0_51
– Maven-3.2.1
–myEclipse2013 SR
– Mahout-0.8
– Hadoop-2.2.0
1.2 用Maven构建Mahout开发环境
1.2.1 用Maven创建一个标准化的Java项目
D:\MyEclipse Professional\java>cd D:\MyEclipse Professional\myMahout
D:\MyEclipse Professional\myMahout>mvn archetype:generate-DarchetypeGroupId=org
.apache.maven.archetypes -DgroupId=org.conan.mymahout-DartifactId=myMahout -Dpa
ckageName=org.conan.mymahout -Dversion=1.0-SNAPSHOT-DinteractiveMode=false
[INFO] Scanning for projects…
[INFO]
[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea
ded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO]————————————————————————
[INFO] Building Maven Stub Project (No POM) 1
[INFO]————————————————————————
[INFO]
[INFO] >>> maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom >>
>
[INFO]
[INFO] <<< maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom <<
<
[INFO]
[INFO] — maven-archetype-plugin:2.2:generate (default-cli) @standalone-pom –
-
[INFO] Generating project in Batch mode
[INFO] No archetype defined. Using maven-archetype-quickstart(org.apache.maven.
archetypes:maven-archetype-quickstart:1.0)
[INFO]————————————————————————-
—
[INFO] Using following parameters for creating project from Old(1.x) Archetype:
maven-archetype-quickstart:1.0
[INFO]————————————————————————-
—
[INFO] Parameter: groupId, Value: org.conan.mymahout
[INFO] Parameter: packageName, Value: org.conan.mymahout
[INFO] Parameter: package, Value: org.conan.mymahout
[INFO] Parameter: artifactId, Value: myMahout
[INFO] Parameter: basedir, Value: D:\MyEclipseProfessional\myMahout
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir:D:\MyEclipse Professiona
l\myMahout\myMahout
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO]————————————————————————
[INFO] Total time: 02:29 min
[INFO] Finished at: 2014-03-10T21:12:36+08:00
[INFO] Final Memory: 16M/108M
[INFO]————————————————————————
1.2.3 导入项目到eclipse
1.2.4 增加mahout依赖,修改pom.xml
< project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd" >
< modelVersion > 4.0.0 </ modelVersion >
< groupId > org.conan.mymahout </ groupId >
< artifactId > myMahout </ artifactId >
< packaging > jar </ packaging >
< version > 1.0-SNAPSHOT </ version >
< name > myMahout </ name >
< url > http://maven.apache.org </ url >
< properties >
< project.build.sourceEncoding > UTF-8 </ project.build.sourceEncoding >
< mahout.version > 0.8 </ mahout.version >
</ properties >
< dependencies >
< dependency >
< groupId > org.apache.mahout </ groupId >
< artifactId > mahout -core </ artifactId >
< version > ${mahout.version} </ version >
</ dependency >
< dependency >
< groupId > org.apache.mahout </ groupId >
< artifactId > mahout -integration </ artifactId >
< version > ${mahout.version} </ version >
< exclusions >
< exclusion >
< groupId > org.mortbay.jetty </ groupId >
< artifactId > jetty </ artifactId >
</ exclusion >
< exclusion >
< groupId > org.apache.cassandra </ groupId >
< artifactId > cassandra -all </ artifactId >
</ exclusion >
< exclusion >
< groupId > me.prettyprint </ groupId >
< artifactId > hector -core </ artifactId >
</ exclusion >
</ exclusions >
</ dependency >
</ dependencies >
</ project >
1.2.4 下载依赖
D:\MyEclipse Professional\myMahout\myMahout>mvn clean install
[INFO] Scanning for projects…
[INFO]
[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea
ded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO]————————————————————————
[INFO] Building myMahout 1.0-SNAPSHOT
[INFO]————————————————————————
[INFO]
[INFO] — maven-clean-plugin:2.5:clean (default-clean) @ myMahout—
[INFO]
[INFO] — maven-resources-plugin:2.6:resources (default-resources)@ myMahout -
–
[INFO] Using ‘UTF-8′ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my
Mahout\src\main\resources
[INFO]
[INFO] — maven-compiler-plugin:2.5.1:compile(default-compile) @ myMahout —
[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\classes
[INFO]
[INFO] — maven-resources-plugin:2.6:testResources(default-testResources) @ my
Mahout —
[INFO] Using ‘UTF-8′ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my
Mahout\src\test\resources
[INFO]
[INFO] — maven-compiler-plugin:2.5.1:testCompile(default-testCompile) @ myMah
out —
[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\test-classes
[INFO]
[INFO] — maven-surefire-plugin:2.12.4:test(default-test) @ myMahout —
[INFO] Surefire report directory: D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\surefire-reports
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-junit4/2.12.4/surefire-junit4-2.12.4.pom
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-junit4/2.12.4/surefire-junit4-2.12.4.pom(3 KB at 0.5 KB/sec)
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-providers/2.12.4/surefire-providers-2.12.4.pom
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-providers/2.12.4/surefire-providers-2.12.4.pom(3 KB at 3.1 KB/sec)
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-junit4/2.12.4/surefire-junit4-2.12.4.jar
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-junit4/2.12.4/surefire-junit4-2.12.4.jar(37 KB at 16.2 KB/sec)
——————————————————-
T E S T S
——————————————————-
Running org.conan.mymahout.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:0.007 sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] — maven-jar-plugin:2.4:jar (default-jar) @ myMahout —
[INFO] Building jar: D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout
-1.0-SNAPSHOT.jar
[INFO]
[INFO] — maven-install-plugin:2.4:install (default-install) @myMahout —
[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout-1.
0-SNAPSHOT.jar toC:\Users\Administrator\.m2\repository\org\conan\mymahout\myMah
out\1.0-SNAPSHOT\myMahout-1.0-SNAPSHOT.jar
[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\pom.xml to C:\User
s\Administrator\.m2\repository\org\conan\mymahout\myMahout\1.0-SNAPSHOT\myMahout
-1.0-SNAPSHOT.pom
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO]————————————————————————
[INFO] Total time: 13.173 s
[INFO] Finished at: 2014-03-10T21:28:56+08:00
[INFO] Final Memory: 24M/178M
[INFO]————————————————————————
D:\MyEclipse Professional\myMahout\myMahout>
在eclipse中刷新项目:
1.3 用Mahout实现协同过滤userCF
2. 用案例的数据集,基于Mahout,任选一种算法,对任意一个女性用户进行协同过滤推荐,并解释推荐结果是否合理,解释过程可以写成一文档说明。
控制台输出:只截取部分结果:
userEuclidean =>uid:163,(279,5.500000)
itemEuclidean =>uid:163,(374,9.454545)(264,9.000000)(852,8.927536)
userEuclideanNoPref=>uid:163,(279,2.000000)(2,1.000000)(415,1.000000)
itemEuclideanNoPref=>uid:163,(138,5.150000)(246,4.092857)(288,3.833333)我们查看uid=163的用户推荐信息:推荐了138。然后我们看看图书138评分比较高的都有哪些用户:
userid | bookid | score | sex | age |
152 | 138 | 8 | F | 26 |
172 | 138 | 4 | F | 56 |
其中152用户对973图书的评分很高。
userid | bookid | score | sex | age |
152 | 973 | 8 | F | 26 |
163 | 973 | 9 | F | 32 |
所以是合理的。
3. 接第2题,增加过滤条件,排除男性,只保留对女性用户的推荐评分,然后进行推荐,并解释推荐结果,是否合理。要求有代码,运行过程抓图,代码的文档说明,解释结果的文档说明等。
package org.conan.mymahout.recommendation.book;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
importorg.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
public class BookFilterGenderResult {
final static intNEIGHBORHOOD_NUM = 2;
final static intRECOMMENDER_NUM = 3;
public static void main(String[]args) throws TasteException, IOException {
String file ="datafile/book/rating.csv";
DataModel dataModel= RecommendFactory.buildDataModel(file);
RecommenderBuilderrb1 = BookEvaluator.userEuclidean(dataModel);
RecommenderBuilder rb2 =BookEvaluator.itemEuclidean(dataModel);
RecommenderBuilderrb3 = BookEvaluator.userEuclideanNoPref(dataModel);
RecommenderBuilderrb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
long uid = 152;
System.out.print("userEuclidean =>");
filterGender(uid,rb1, dataModel);
System.out.print("itemEuclidean =>");
filterGender(uid,rb2, dataModel);
System.out.print("userEuclideanNoPref =>");
filterGender(uid,rb3, dataModel);
System.out.print("itemEuclideanNoPref =>");
filterGender(uid,rb4, dataModel);
}
/**
* 对用户性别进行过滤
*/
public static voidfilterGender(long uid, RecommenderBuilder recommenderBuilder, DataModeldataModel) throws TasteException, IOException {
//Set<Long>userids = getMale("datafile/book/user.csv");
Set <Long>userids = getFeMale("datafile/book/user.csv");
//计算女性用户打分过的图书
Set bookids = newHashSet();
for (long uids :userids) {
LongPrimitiveIterator iter =dataModel.getItemIDsFromUser(uids).iterator();
while(iter.hasNext()) {
long bookid = iter.next();
bookids.add(bookid);
}
}
IDRescorer rescorer= new FilterRescorer(bookids);
List list =recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM,rescorer);
RecommendFactory.showItems(uid, list, false);
}
/**
* 获得男性用户ID
*/
public static SetgetMale(String file) throws IOException {
BufferedReader br =new BufferedReader(new FileReader(new File(file)));
Set userids = newHashSet();
String s = null;
while ((s =br.readLine()) != null) {
String[] cols =s.split(",");
if(cols[1].equals("M")) {// 判断男性用户
userids.add(Long.parseLong(cols[0]));
}
}
br.close();
return userids;
}
/**
* 获得女性用户ID
*/
public static SetgetFeMale(String file) throws IOException {
BufferedReader br =new BufferedReader(new FileReader(new File(file)));
Set userids = newHashSet();
String s = null;
while ((s =br.readLine()) != null) {
String[] cols =s.split(",");
if(cols[1].equals("F")) {// 判断女性用户
userids.add(Long.parseLong(cols[0]));
}
}
br.close();
return userids;
}
}
/**
* 对结果重计算
*/
class FilterRescorer implements IDRescorer {
final private Setuserids;
publicFilterRescorer(Set userids) {
this.userids =userids;
}
@Override
public doublerescore(long id, double originalScore) {
returnisFiltered(id) ? Double.NaN : originalScore;
}
@Override
public booleanisFiltered(long id) {
return !userids.contains(id);
}
}
运行结果:
userEuclidean
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:0.11111108462015788
RecommenderIR Evaluator: [Precision:0.3010752688172043,Recall:0.08542713567839195]
itemEuclidean
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:1.3536954060693203
RecommenderIR Evaluator: [Precision:0.0,Recall:0.0]
userEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:4.61812258478421
RecommenderIR Evaluator: [Precision:0.09045226130653267,Recall:0.09296482412060306]
itemEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:2.625455679766278
RecommenderIR Evaluator: [Precision:0.6005025125628134,Recall:0.6055276381909548]
userEuclidean =>uid:99,
itemEuclidean =>uid:99,(586,10.000000)(378,10.000000)(202,9.666667)
userEuclideanNoPref=>uid:99,(616,1.000000)(307,1.000000)(552,1.000000)
itemEuclideanNoPref=>uid:99,(96,3.392724)(860,3.250000)(375,3.200000)
我们对itemEuclideanNoPref算法的结果进行分析。
排名第一的是ID为96的图书,我再一步向下追踪:查询哪些用户对图书96的打分比较高。
73 | 96 | 8 | F | 28 |
79 | 96 | 7 | F | 32 |
117 | 96 | 10 | F | 34 |
163 | 96 | 8 | F | 32 |
所有得用户都是女性,其中117用户对106图书的评分很高。
userid | bookid | score | sex | age |
99 | 106 | 10 | F | 37 |
117 | 106 | 7 | F | 34 |
所以是合理的。
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/
Hadoop家族系列文章 ,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。
从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。
作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!
关于作者:
- 张丹(Conan), 程序员Java,R,PHP,Javascript
- weibo:@Conan_Z
- blog: http://blog.fens.me
- email: bsspirit@gmail.com
转载请注明出处:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/
前言
Mahout是Hadoop家族一员,从血缘就继承了Hadoop程序的特点,支持HDFS访问和MapReduce分步式算法。随着Mahout的发展,从0.7版本开始,Mahout做了重大的升级。移除了部分算法的单机内存计算,只支持基于Hadoop的MapReduce平行计算。
从这点上,我们能看出Mahout走向大数据,坚持并行化的决心!相信在Hadoop的大框架下,Mahout最终能成为一个大数据的明星产品!
目录
- Mahout开发环境介绍
- Mahout基于Hadoop的分步环境介绍
- 用Mahout实现协同过滤ItemCF
- 模板项目上传github
1. Mahout开发环境介绍
在 用Maven构建Mahout项目 文章中,我们已经配置好了基于Maven的Mahout的开发环境,我们将继续完成Mahout的分步式的程序开发。
本文的mahout版本为0.8。
开发环境:
- Win7 64bit
- Java 1.6.0_45
- Maven 3
- Eclipse Juno Service Release 2
- Mahout 0.8
- Hadoop 1.1.2
找到pom.xml,修改mahout版本为0.8
<mahout.version>0.8</mahout.version>
然后,下载依赖库。
~ mvn clean install
由于 org.conan.mymahout.cluster06.Kmeans.java 类代码,是基于mahout-0.6的,所以会报错。我们可以先注释这个文件。
2. Mahout基于Hadoop的分步环境介绍
如上图所示,我们可以选择在win7中开发,也可以在linux中开发,开发过程我们可以在本地环境进行调试,标配的工具都是Maven和Eclipse。
Mahout在运行过程中,会把MapReduce的算法程序包,自动发布的Hadoop的集群环境中,这种开发和运行模式,就和真正的生产环境差不多了。
3. 用Mahout实现协同过滤ItemCF
实现步骤:
- 1. 准备数据文件: item.csv
- 2. Java程序:HdfsDAO.java
- 3. Java程序:ItemCFHadoop.java
- 4. 运行程序
- 5. 推荐结果解读
1). 准备数据文件: item.csv
上传测试数据到HDFS,单机内存实验请参考文章: 用Maven构建Mahout项目
~ hadoop fs -mkdir /user/hdfs/userCF~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF~ hadoop fs -cat /user/hdfs/userCF/item.csv1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.05,103,2.05,104,4.05,105,3.55,106,4.0
2). Java程序:HdfsDAO.java
HdfsDAO.java,是一个HDFS操作的工具,用API实现Hadoop的各种HDFS命令,请参考文章: Hadoop编程调用HDFS
我们这里会用到HdfsDAO.java类中的一些方法:
HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile);
3). Java程序:ItemCFHadoop.java
用Mahout实现分步式算法,我们看到Mahout in Action中的解释。
实现程序:
package org.conan.mymahout.recommendation;import org.apache.hadoop.mapred.JobConf;import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;import org.conan.mymahout.hdfs.HdfsDAO;public class ItemCFHadoop { private static final String HDFS = "hdfs://192.168.1.210:9000"; public static void main(String[] args) throws Exception { String localFile = "datafile/item.csv"; String inPath = HDFS + "/user/hdfs/userCF"; String inFile = inPath + "/item.csv"; String outPath = HDFS + "/user/hdfs/userCF/result/"; String outFile = outPath + "/part-r-00000"; String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis(); JobConf conf = config(); HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile); StringBuilder sb = new StringBuilder(); sb.append("--input ").append(inPath); sb.append(" --output ").append(outPath); sb.append(" --booleanData true"); sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity"); sb.append(" --tempDir ").append(tmpPath); args = sb.toString().split(" "); RecommenderJob job = new RecommenderJob(); job.setConf(conf); job.run(args); hdfs.cat(outFile); } public static JobConf config() { JobConf conf = new JobConf(ItemCFHadoop.class); conf.setJobName("ItemCFHadoop"); conf.addResource("classpath:/hadoop/core-site.xml"); conf.addResource("classpath:/hadoop/hdfs-site.xml"); conf.addResource("classpath:/hadoop/mapred-site.xml"); return conf; }}
RecommenderJob.java,实际上就是封装了,上面整个图的分步式并行算法的执行过程!如果没有这层封装,我们需要自己去实现图中8个步骤MapReduce算法。
关于上面算法的深度剖析,请参考文章: R实现MapReduce的协同过滤算法
4). 运行程序
控制台输出:
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCFCreate: hdfs://192.168.1.210:9000/user/hdfs/userCFcopy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCFls: hdfs://192.168.1.210:9000/user/hdfs/userCF==========================================================name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229==========================================================cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.05,103,2.05,104,4.05,105,3.55,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00012013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor信息: Got brand-new compressor2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:36 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0001_m_000000_0' done.2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor信息: Got brand-new decompressor2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0001_r_000000_0 is allowed to commit now2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0001_r_000000_0' done.2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00012013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Bytes Written=1872013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=32873302013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=9162013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=34432922013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=6452013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2292013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=462013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map input records=212013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output bytes=842013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=3765698562013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1162013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00022013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:37 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0002_m_000000_0' done.2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0002_r_000000_0 is allowed to commit now2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0002_r_000000_0' done.2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00022013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Counters: 202013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: USERS=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Bytes Written=2882013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=65742742013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=13742013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=68875922013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=11202013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2292013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=722013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map input records=212013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Spilled Records=422013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output bytes=632013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=5759303682013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1162013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce input records=212013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00032013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:38 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0003_m_000000_0' done.2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0003_r_000000_0 is allowed to commit now2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0003_r_000000_0' done.2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00032013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Counters: 212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3352013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: USER_RATINGS_NEGLECTED=02013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: USER_RATINGS_USED=212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=98613492013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=19502013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=103319582013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=17512013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2882013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=932013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map input records=52013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output bytes=3362013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=7752908802013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1572013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00042013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:39 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0004_m_000000_0' done.2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0004_r_000000_0 is allowed to commit now2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0004_r_000000_0' done.2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00042013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Counters: 202013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3812013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=131484762013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=26282013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=137804082013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=25512013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3352013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: ROWS=72013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1222013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Spilled Records=162013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output bytes=5162013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=9746513922013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1582013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Combine input records=242013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce input records=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Combine output records=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output records=242013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00052013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:40 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0005_m_000000_0' done.2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0005_r_000000_0 is allowed to commit now2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0005_r_000000_0' done.2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00052013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Counters: 212013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3922013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=164355772013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=34882013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=172300102013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=34082013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3812013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: PRUNED_COOCCURRENCES=02013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: COOCCURRENCES=572013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1252013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map input records=52013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output bytes=7442013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=11740119042013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1292013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00062013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:41 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0006_m_000000_0' done.2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0006_r_000000_0 is allowed to commit now2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0006_r_000000_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00062013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Bytes Written=5542013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=197227402013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=43422013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=206747722013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=43542013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3922013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1622013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output bytes=5992013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=13733724162013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1402013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Combine input records=252013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output records=252013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00072013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_m_000000_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_m_000001_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 2 sorted segments2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor信息: Got brand-new decompressor2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0007_r_000000_0 is allowed to commit now2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_r_000000_0' done.2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00072013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Bytes Written=5722013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=345179132013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=87512013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=361826302013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=79342013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Bytes Read=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=2412013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map input records=122013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Spilled Records=562013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output bytes=4532013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=25584599042013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=6652013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce input records=282013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output records=282013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00082013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:43 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0008_m_000000_0' done.2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0008_r_000000_0 is allowed to commit now2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0008_r_000000_0' done.2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00082013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Bytes Written=2172013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=262998022013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=73572013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=275664082013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=62692013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Bytes Read=5722013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=2102013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Spilled Records=422013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output bytes=9272013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=19714539522013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1372013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce input records=212013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=52013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output records=21cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-000001 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]2 [106:1.560478,105:1.4795978,107:0.69935876]3 [103:1.2475469,106:1.1944525,102:1.1462644]4 [102:1.6462644,105:1.5277859,107:0.69935876]5 [107:1.1993587]
5). 推荐结果解读
我们可以把上面的日志分解析成3个部分解读
a. 初始化环境
出初HDFS的数据目录和工作目录,并上传数据文件。
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCFCreate: hdfs://192.168.1.210:9000/user/hdfs/userCFcopy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCFls: hdfs://192.168.1.210:9000/user/hdfs/userCF==========================================================name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229==========================================================cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv
b. 算法执行
分别执行,上图中对应的8种MapReduce算法。
Job complete: job_local_0001
Job complete: job_local_0002
Job complete: job_local_0003
Job complete: job_local_0004
Job complete: job_local_0005
Job complete: job_local_0006
Job complete: job_local_0007
Job complete: job_local_0008
c. 打印推荐结果
方便我们看到计算后的推荐结果
cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-000001 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]2 [106:1.560478,105:1.4795978,107:0.69935876]3 [103:1.2475469,106:1.1944525,102:1.1462644]4 [102:1.6462644,105:1.5277859,107:0.69935876]5 [107:1.1993587]
4. 模板项目上传github
https://github.com/bsspirit/maven_mahout_template/tree/mahout-0.8
大家可以下载这个项目,做为开发的起点。
~ git clone https://github.com/bsspirit/maven_mahout_template~ git checkout mahout-0.8
我们完成了基于物品的协同过滤分步式算法实现,下面将继续介绍Mahout的Kmeans的分步式算法实现,请参考文章: Mahout分步式程序开发 聚类Kmeans
转载请注明出处:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/
如果要实现Taste算法,必备的条件是:
1) JDK,使用1.6版本。需要说明一下,因为要基于Eclipse构建,所以在设置path的值之前要先定义JAVA_HOME变量。
2) Maven,使用2.0.11版本或以上。在eclipse上安装maven插件—m2eclipse。
3)Apache Mahout,使用0.5版本。
Apache Mahout -Taste Documentation中的安装步骤:
- 4. Demo
- To build and run the demo, follow the instructions below, which are written for Unix-like
- operating systems:
- 1. Obtain a copy of the Mahout distribution, either from SVN or as a downloaded archive.
- 2. Download the "1 Million MovieLens Dataset" from http: //www.grouplens.org/.
- 3. Unpack the archive and copy movies.dat and ratings.dat to
- trunk/taste-web/src/main/resources/org/apache/mahout/cf/taste/example/
- under the Mahout distribution directory.
- 4. Navigate to the directory where you unpacked the Mahout distribution, and navigate
- totrunk.
- 5. Runmvn install, which builds and installs Mahout core to your local repository
- 6. cd taste-web
- 7. cp ../examples/target/grouplens.jar ./lib
- 8. Edit recommender.properties and fill in therecommender. class :
- recommender. class =org.apache.mahout.cf.taste.example.grouplens.GroupLe
- 9. mvn package
- 10.mvn jetty:run-war. You may need to give Maven more memory: in a bash shell,
- export MAVEN_OPTS=-Xmx1024M
- 11.Get recommendations by accessing the web application in your browser:
- http: //localhost:8080/RecommenderServlet?userID=1
- This will produce a simple preference-item ID list which could be consumed by a client
- application. Get more useful human-readable output with the debug parameter:
- http: //localhost:8080/RecommenderServlet?userID=1&debug=true
- Incidentally, Taste’s web service interface may then be found at:
- http: //localhost:8080/RecommenderService.jws
- Its WSDL file will be here…
- http: //localhost:8080/RecommenderService.jws?wsdl
- … and you can even access it in your browser via a simple HTTP request:
- …/RecommenderService.jws?method=recommend&userID=1&howMany=10
一、在window上安装maven
现在Java新架构的不断出现,例如Struts,Spring,Hibernate等,项目的配置文件的增多,给开发人员带来很大麻烦。在实际的开发当中,Myeclipse中的project越来越庞大,所依赖的第三方Jar包越来越多,这显得Project很臃肿,给项目管理带来了很大不便,尤其是在一些大型项目。为了解决上述问题,Apache开源组织发布了Maven,它适用于大的Java项目。
有关maven介绍见《Maven权威指南》 ,下载地址: http://www.juvenxu.com/mvn-def-guide/
安装步骤:
1、下载包,见http://maven.apache.org/download.html
2、解压缩,将其中的bin目录设置到windows Path环境变量中,maven也是依赖jdk的,先装好jdk,在环境变量里面配置好jdk。
2.1、设置 JAVA_HOME(顾名其意该变量的含义就是java的安装路径),找到path,然后点编辑,path变量的含义就是系统在任何路径下都可以识别java命令,则变量值为“.;%JAVA_HOME%\bin”,
2.2、 新建变量名:M2_HOME,变量值:E:\maven\apache-maven-2.2.1,注意这里不含bin的路径。 2、在 path后追加;%M2_HOME%\bin,注意这里到bin目录
3、测试安装是否成功:开始->运行->cmd->mvn -version
注意:当提示mvn提示不是内部命令或外部命令,是因为在设置环境变量path的时候,可能覆盖了原先设置着的变量,只要在path后面添加变量:%SystemRoot%\system32;
4、在eclipse中安装maven插件 http://she.iteye.com/blog/1217812 、 http://www.cnblogs.com/freeliver54/archive/2011/09/07/2169527.html
5、使用links管理eclipse插件 http://blog.csdn.net/cfyme/article/details/6099056/
二、在windows上构建Apache Mahout环境
Apache Mahout 是 Apache Software Foundation (ASF) 开发的一个全新的开源项目,其主要目标是创建一些可伸缩的机器学习算法,供开发人员在 Apache 在许可下免费使用。该项目已经发展到了它的最二个年头,目前只有一个公共发行版。Mahout 包含许多实现,包括集群、分类、CP 和进化程序。
详细内容见:
1、Apache Mahout 简介 http://www.ibm.com/developerworks/cn/java/j-mahout/
2、Maven 2.0:编译、测试、部署、运行 http://www.ideagrace.com/html/doc/2006/06/14/00847.html
开始构建:
1、基于 Apache Mahout 构建社会化推荐引擎 http://www.ibm.com/developerworks/cn/java/j-lo-mahout/
本文是由此篇文章引申而来,所以具体就是实现了“Taste的安装于简单的Demo实现”。
2、使用mvn 搭建Mahout环境 http://anqiang1900.blog.163.com/blog/static/1141888642010380255296/
简单来说就是将Mahout源码从官网上下载下来后,在dos下切换到根文件夹后执行mvn install。
3、在Eclipse中构建Mahout http://www.cnblogs.com/dlts26/archive/2011/09/13/2174889.html
就是将Mahout源码导入Eclipse从而形成Maven工程。再在mahout文件夹下执行maven install(如果上一步没做这个的话)。
三、运行Apache Mahout中的Taste Webapp例子
Taste 是 Apache Mahout 提供的一个协同过滤算法的高效实现,它是一个基于 Java 实现的可扩展的,高效的推荐引擎。
1.修改mahout-taste-webapp工程的pom.xml,添加对mahout-examples的依赖
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>mahout-examples</artifactId>
<version>0.5</version>
</dependency>
2.在mahout-taste-webapp工程的recommender.properties中添加
recommender.class=org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender
3.从http://www.grouplens.org/node/73上下载数据文件,我下载的是1M Ratings Data Set (.tar.gz)经过测试验证通过,其他数据文件请自行验证。解压以后将ratings.dat复制到mahout-taste-webapp工程的/org/apache/mahout/cf/taste/example/grouplens/下,至于为什么是这个路径?请大家去看这个类GroupLensDataModel。
4.现在准备工作基本完成了,cd到taste-web我们来运行一把
mvn jetty:run-war
5.访问一下http://localhost:8080/RecommenderServlet?userID=1就能看到效果,这个servlet还支持其他参数请参看RecommenderServlet的javadoc说明
详细内容见 http://seanhe.iteye.com/blog/1124682
1、在Eclipse中配置Maven时遇到的问题
启动eclipse的时候会提示warning:找不到jdk啥的,解决办法:
在eclipse.ini文件中加入如下两行(vm指向javaw.exe的位置,或者直接到bin那里也可以):
-vm
D:\Development\Java\jdk1.5.0_16\bin\javaw.exe(注意这两行加到-startup与-launcher.library之间)
2、在windows上构建mahout环境时出现的问题:
2.1在mahout目录下,运行"mvn install"时,遇到以下错误
Cannot run program "chmod": CreateProcess error=2
chmod是linux命令,此错误是由于 Cygwin + Hadoop 跑在 Windows 上出现的。
也就是说如果当前在 windows 下进行mahout编译,一定要确保正确安装了 Cygwin(按照下面的教程装上Cygwin便可,后面hadoop的配置可以不用全部完成!)
这里用几个比较好的教材,讲解如何在 windows 下安装 Hadoop Cluster(
http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.htm l
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/ )
下载 hadoop-0.19.1在 http://archive.apache.org/dist/hadoop/core/hadoop-0.19.1/
2.2在Cygwin中运行命令ssh localhost连接不成功时出现Connection closed by ::1错误
Cygwin,耗时近xxxx个小时,查遍中文外文文献,终于将此题目解决。问题描述:在Win7下Cygwin中,使用sshlocalhost命令, 出现Connectionclosedby127.0.0.1的问题。解决方案:1、开端——运行——services.msc2、右键 CYGWINsshd——属性——登录选项卡——选择“此账户”——浏览——高级——立即查找——选择你的账户名(必须为治理员权限)——输进密码(必须 要有,空密码不承受,且和电脑登录密码相同)——确定。3、重启CYGWINsshd效劳即可。这样就以你的账户的名义启动了这个效劳。而后sshlocalhost成功。这样 做的一个缺点可能是你要给电脑设个密码
详见: http://blog.sina.com.cn/s/blog_4abbf0ae0100r8hh.html
3、运行Taste Webapp时遇到的问题
在Eclipse中配置好mahout后,就可以在mahout中运行taste-webapp算法了。
文中1,2步骤由于前面已经配置好,就直接从第3步开始配置mahout-taste-webapp中的内容。
出现的问题:
在浏览器栏输入 http://localhost:8080/RecommenderServlet?userID=1后出现错误:
HTTP ERROR: 404
Problem accessing /RecommenderServlet. Reason:
Not Found
Powered by Jetty://
仔细查看第7步mvn jetty:run-war时,发现其中出现错误:
WARN::FAILED taste-recommender: java.lang.OutOfMemoryError: Java heap space
表明出现maven工程内存溢出。
解决办法:
Windows环境中
在Maven安装目录中找到文件%M2_HOME%\bin\mvn.bat ,这就是启动Maven的脚本文件,在该文件中你能看到有一行注释为:
@REM set MAVEN_OPTS=-Xdebug -Xnoagent -Djava.compiler=NONE…
它的意思是你可以设置一些Maven参数,我们就在注释下面加入一行:
set MAVEN_OPTS=-Xmx1024M
或者,在执行 mvn jetty:run-war命令之前,执行
F:\mahout-distribution-0.5\taste-web>set MAVEN_OPTS= -Xmx1024M
我们看到,配置的Maven选项生效了,OutOfMemoryError也能得以相应的解决。