一、原始数据
这里下载
部分数据展示如下(截取几十条):
格式为:user_id item_id rating_value
例如:
23 387 5
代表“用户23已经将项目387评为5”的事实
ratings_data.txt 0000644 0001750 0001750 00040776620 10377616417 015147 0 ustar phauly phauly 0000000 0000000 1 100 4
1 101 5
1 102 3
1 10 3
1 103 5
1 104 2
1 105 5
1 106 5
1 107 5
1 108 5
1 109 3
1 110 4
1 111 5
1 112 4
1 113 5
1 11 4
1 114 5
1 115 5
1 116 5
1 117 5
1 118 5
1 119 3
1 120 5
1 121 2
1 122 4
1 123 3
1 124 5
1 12 5
1 125 4
1 126 5
1 127 5
1 128 5
1 129 5
1 130 5
1 131 4
1 132 5
1 133 5
1 134 5
1 13 5
1 135 3
1 136 4
1 137 5
1 138 5
1 139 4
1 140 5
1 141 4
1 142 4
1 14 3
1 143 5
1 144 4
二、任务要求
(1)统计每个项目关注的用户,输出格式为:key:[value1,value2,…,value]
(2)统计每个项目的分数,输出格式为key value
三、任务分析
如何将
<key1,value1>
<key2,value2>
变成
<key,value1,value2,...value>
四、核心代码
(1)统计每个项目关注的用户
Mapper.java
//拿到一行文本内容,转换成String 类型
String line = value.toString();
//将这行文本切分成单词
String[] words=line.split(" ");
if(words.length>=2){
String keyString = words[1];
int valueString = 0;
if(words[0]!=null && words[0].trim()!="" && words[0].length()!= 0){
valueString = Integer.parseInt(words[0]);
}
//输出<keyString,valueString>
if(keyString!=null && keyString.length()!=0){ //输出先判断一下不为空
context.write(new Text(keyString+":"), new IntWritable(valueString));
}
}
Reducer.java
//通过value这个迭代器,遍历这一组kv中所有的value
String listString = "[";
int count=0;
for(IntWritable value:values){
count++;
//context.write(new Text(","), new IntWritable(value.get()));
if(count==1){
listString = listString + value.get();
}else {
listString = listString + "," + value.get();
}
}
listString = key + listString + "]";
//输出key:[value1, value2, ... ,value] count
context.write(new Text(listString), new IntWritable(count));
}
部分结果数据:
100043:[6596,4296] 2
100044:[4297] 1
100045:[4297,7585,12497,7362] 4
100046:[4299] 1
100047:[10058,6192,4300,4686,15536,12207] 6
100048:[4300] 1
100049:[24249,42119,34686,13452,24010,12497,10262,4300,24504,16909,25568] 11
10004:[6760,1795,23529,115,776] 5
100050:[4301] 1
100051:[4301] 1
100052:[4301,30962,4586,15266,23094] 5
100053:[8806,31816,26823,14562,44326,10205,4301,28782,21679] 9
100054:[4301] 1
100119:[11513,4325] 2
10011:[2003,11310,2023,115,335,2227] 6
100120:[4327,17079,14116,22013] 4
100121:[19834,4327,9897] 3
100122:[4330,23465] 2
100123:[4330] 1
100124:[4331,12120] 2
100125:[4331,29524] 2
100126:[48165,4331,24377,48162] 4
100127:[17419,4331,20560,18372,35202,31975] 6
100128:[4331] 1
100129:[4331] 1
10012:[23476,115] 2
100130:[15779,36925,12675,4331,16239] 5
100131:[37964,4331,32528,14822] 4
100132:[7396,4331,39562,15027,10494] 5
100133:[4331] 1
100134:[21797,6031,4331,5061] 4
(2)统计每个项目的分数
if(words.length>=2){
String keyString = words[1];
int valueString = 0;
if(words[2]!=null && words[2].trim()!="" && words[2].length()!= 0){
valueString = Integer.parseInt(words[2].trim());
}
//输出<keyString,valueString>
if(keyString!=null && keyString.length()!=0){ //输出先判断一下不为空
context.write(new Text(keyString), new IntWritable(valueString));
}
}
部分结果数据:
10004 21
100040 5
100041 5
100042 5
100043 8
100044 1
100045 20
100046 4
100047 24
100048 5
100049 47
10005 12
100050 3
100051 3
100052 20
100053 42
100084 7
100085 11
100086 5
100087 52
100088 27
100089 22
10009 3