书面作业
1. 用Maven搭建Mahout的开发环境,并完成PPT 26页,最简单的例子。要求有过程说明和截图。
1.1开发环境
– Win7 64bit
– Java 1.7.0_51
– Maven-3.2.1
–myEclipse2013 SR
– Mahout-0.8
– Hadoop-2.2.0
1.2 用Maven构建Mahout开发环境
1.2.1 用Maven创建一个标准化的Java项目
D:\MyEclipse Professional\java>cd D:\MyEclipse Professional\myMahout
D:\MyEclipse Professional\myMahout>mvn archetype:generate-DarchetypeGroupId=org
.apache.maven.archetypes -DgroupId=org.conan.mymahout-DartifactId=myMahout -Dpa
ckageName=org.conan.mymahout -Dversion=1.0-SNAPSHOT-DinteractiveMode=false
[INFO] Scanning for projects...
[INFO]
[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea
ded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO]------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO]------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom >>
>
[INFO]
[INFO] <<< maven-archetype-plugin:2.2:generate(default-cli) @ standalone-pom <<
<
[INFO]
[INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @standalone-pom --
-
[INFO] Generating project in Batch mode
[INFO] No archetype defined. Using maven-archetype-quickstart(org.apache.maven.
archetypes:maven-archetype-quickstart:1.0)
[INFO]-------------------------------------------------------------------------
---
[INFO] Using following parameters for creating project from Old(1.x) Archetype:
maven-archetype-quickstart:1.0
[INFO]-------------------------------------------------------------------------
---
[INFO] Parameter: groupId, Value: org.conan.mymahout
[INFO] Parameter: packageName, Value: org.conan.mymahout
[INFO] Parameter: package, Value: org.conan.mymahout
[INFO] Parameter: artifactId, Value: myMahout
[INFO] Parameter: basedir, Value: D:\MyEclipseProfessional\myMahout
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir:D:\MyEclipse Professiona
l\myMahout\myMahout
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]------------------------------------------------------------------------
[INFO] Total time: 02:29 min
[INFO] Finished at: 2014-03-10T21:12:36+08:00
[INFO] Final Memory: 16M/108M
[INFO]------------------------------------------------------------------------
1.2.3 导入项目到eclipse
1.2.4 增加mahout依赖,修改pom.xml
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.conan.mymahout</groupId>
<artifactId>myMahout</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>myMahout</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<mahout.version>0.8</mahout.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>${mahout.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-integration</artifactId>
<version>${mahout.version}</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
</exclusion>
<exclusion>
<groupId>me.prettyprint</groupId>
<artifactId>hector-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
1.2.4 下载依赖
D:\MyEclipse Professional\myMahout\myMahout>mvn clean install
[INFO] Scanning for projects...
[INFO]
[INFO] Using the builderorg.apache.maven.lifecycle.internal.builder.singlethrea
ded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO]------------------------------------------------------------------------
[INFO] Building myMahout 1.0-SNAPSHOT
[INFO]------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ myMahout---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources)@ myMahout -
--
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my
Mahout\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:compile(default-compile) @ myMahout ---
[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources(default-testResources) @ my
Mahout ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\MyEclipseProfessional\myMahout\my
Mahout\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:testCompile(default-testCompile) @ myMah
out ---
[INFO] Compiling 1 source file to D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test(default-test) @ myMahout ---
[INFO] Surefire report directory: D:\MyEclipseProfessional\myMahout\myMahout\ta
rget\surefire-reports
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-junit4/2.12.4/surefire-junit4-2.12.4.pom
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-junit4/2.12.4/surefire-junit4-2.12.4.pom(3 KB at 0.5 KB/sec)
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-providers/2.12.4/surefire-providers-2.12.4.pom
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-providers/2.12.4/surefire-providers-2.12.4.pom(3 KB at 3.1 KB/sec)
Downloading:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/suref
ire-junit4/2.12.4/surefire-junit4-2.12.4.jar
Downloaded:http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefi
re-junit4/2.12.4/surefire-junit4-2.12.4.jar(37 KB at 16.2 KB/sec)
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.conan.mymahout.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:0.007 sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ myMahout ---
[INFO] Building jar: D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout
-1.0-SNAPSHOT.jar
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @myMahout ---
[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\target\myMahout-1.
0-SNAPSHOT.jar toC:\Users\Administrator\.m2\repository\org\conan\mymahout\myMah
out\1.0-SNAPSHOT\myMahout-1.0-SNAPSHOT.jar
[INFO] Installing D:\MyEclipseProfessional\myMahout\myMahout\pom.xml to C:\User
s\Administrator\.m2\repository\org\conan\mymahout\myMahout\1.0-SNAPSHOT\myMahout
-1.0-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]------------------------------------------------------------------------
[INFO] Total time: 13.173 s
[INFO] Finished at: 2014-03-10T21:28:56+08:00
[INFO] Final Memory: 24M/178M
[INFO]------------------------------------------------------------------------
D:\MyEclipse Professional\myMahout\myMahout>
在eclipse中刷新项目:
1.3 用Mahout实现协同过滤userCF
2. 用案例的数据集,基于Mahout,任选一种算法,对任意一个女性用户进行协同过滤推荐,并解释推荐结果是否合理,解释过程可以写成一文档说明。
控制台输出:只截取部分结果:
userEuclidean =>uid:163,(279,5.500000)
itemEuclidean =>uid:163,(374,9.454545)(264,9.000000)(852,8.927536)
userEuclideanNoPref=>uid:163,(279,2.000000)(2,1.000000)(415,1.000000)
itemEuclideanNoPref=>uid:163,(138,5.150000)(246,4.092857)(288,3.833333)我们查看uid=163的用户推荐信息:推荐了138。然后我们看看图书138评分比较高的都有哪些用户:
userid | bookid | score | sex | age |
152 | 138 | 8 | F | 26 |
172 | 138 | 4 | F | 56 |
其中152用户对973图书的评分很高。
userid | bookid | score | sex | age |
152 | 973 | 8 | F | 26 |
163 | 973 | 9 | F | 32 |
所以是合理的。
3. 接第2题,增加过滤条件,排除男性,只保留对女性用户的推荐评分,然后进行推荐,并解释推荐结果,是否合理。要求有代码,运行过程抓图,代码的文档说明,解释结果的文档说明等。
package org.conan.mymahout.recommendation.book;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
importorg.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
public class BookFilterGenderResult {
final static intNEIGHBORHOOD_NUM = 2;
final static intRECOMMENDER_NUM = 3;
public static void main(String[]args) throws TasteException, IOException {
String file ="datafile/book/rating.csv";
DataModel dataModel= RecommendFactory.buildDataModel(file);
RecommenderBuilderrb1 = BookEvaluator.userEuclidean(dataModel);
RecommenderBuilder rb2 =BookEvaluator.itemEuclidean(dataModel);
RecommenderBuilderrb3 = BookEvaluator.userEuclideanNoPref(dataModel);
RecommenderBuilderrb4 = BookEvaluator.itemEuclideanNoPref(dataModel);
long uid = 152;
System.out.print("userEuclidean =>");
filterGender(uid,rb1, dataModel);
System.out.print("itemEuclidean =>");
filterGender(uid,rb2, dataModel);
System.out.print("userEuclideanNoPref =>");
filterGender(uid,rb3, dataModel);
System.out.print("itemEuclideanNoPref =>");
filterGender(uid,rb4, dataModel);
}
/**
* 对用户性别进行过滤
*/
public static voidfilterGender(long uid, RecommenderBuilder recommenderBuilder, DataModeldataModel) throws TasteException, IOException {
//Set<Long>userids = getMale("datafile/book/user.csv");
Set <Long>userids = getFeMale("datafile/book/user.csv");
//计算女性用户打分过的图书
Set bookids = newHashSet();
for (long uids :userids) {
LongPrimitiveIterator iter =dataModel.getItemIDsFromUser(uids).iterator();
while(iter.hasNext()) {
long bookid = iter.next();
bookids.add(bookid);
}
}
IDRescorer rescorer= new FilterRescorer(bookids);
List list =recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM,rescorer);
RecommendFactory.showItems(uid, list, false);
}
/**
* 获得男性用户ID
*/
public static SetgetMale(String file) throws IOException {
BufferedReader br =new BufferedReader(new FileReader(new File(file)));
Set userids = newHashSet();
String s = null;
while ((s =br.readLine()) != null) {
String[] cols =s.split(",");
if(cols[1].equals("M")) {// 判断男性用户
userids.add(Long.parseLong(cols[0]));
}
}
br.close();
return userids;
}
/**
* 获得女性用户ID
*/
public static SetgetFeMale(String file) throws IOException {
BufferedReader br =new BufferedReader(new FileReader(new File(file)));
Set userids = newHashSet();
String s = null;
while ((s =br.readLine()) != null) {
String[] cols =s.split(",");
if(cols[1].equals("F")) {// 判断女性用户
userids.add(Long.parseLong(cols[0]));
}
}
br.close();
return userids;
}
}
/**
* 对结果重计算
*/
class FilterRescorer implements IDRescorer {
final private Setuserids;
publicFilterRescorer(Set userids) {
this.userids =userids;
}
@Override
public doublerescore(long id, double originalScore) {
returnisFiltered(id) ? Double.NaN : originalScore;
}
@Override
public booleanisFiltered(long id) {
return !userids.contains(id);
}
}
运行结果:
userEuclidean
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:0.11111108462015788
RecommenderIR Evaluator: [Precision:0.3010752688172043,Recall:0.08542713567839195]
itemEuclidean
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:1.3536954060693203
RecommenderIR Evaluator: [Precision:0.0,Recall:0.0]
userEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:4.61812258478421
RecommenderIR Evaluator: [Precision:0.09045226130653267,Recall:0.09296482412060306]
itemEuclideanNoPref
AVERAGE_ABSOLUTE_DIFFERENCEEvaluater Score:2.625455679766278
RecommenderIR Evaluator: [Precision:0.6005025125628134,Recall:0.6055276381909548]
userEuclidean =>uid:99,
itemEuclidean =>uid:99,(586,10.000000)(378,10.000000)(202,9.666667)
userEuclideanNoPref=>uid:99,(616,1.000000)(307,1.000000)(552,1.000000)
itemEuclideanNoPref=>uid:99,(96,3.392724)(860,3.250000)(375,3.200000)
我们对itemEuclideanNoPref算法的结果进行分析。
排名第一的是ID为96的图书,我再一步向下追踪:查询哪些用户对图书96的打分比较高。
73 | 96 | 8 | F | 28 |
79 | 96 | 7 | F | 32 |
117 | 96 | 10 | F | 34 |
163 | 96 | 8 | F | 32 |
所有得用户都是女性,其中117用户对106图书的评分很高。
userid | bookid | score | sex | age |
99 | 106 | 10 | F | 37 |
117 | 106 | 7 | F | 34 |
所以是合理的。