1)需求:
以下是博客的好友列表数据,冒号前是一个用户,冒号后是该用户的所有好友(数据中的好友关系是单向的),求出哪些人两两之间有共同好友,及他俩的共同好友都有谁?
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J
首先说明数据单向的意思可以理解为:
A关注了B,C,D,F,E,O,而B关注了A,C,E,K,求A,B都共同关注了谁?
2)分析
求人与人之间的共同好友,人与人之间是否是同一个好友,是否在彼此的好友列表无关。
如果这个程序不用mapreduce做那么应该是先把人全部切分出来,然后循环进行人与人的组合,组合之后将他们好友列表组合,将那些出现两次的还有找到,这些就是人与人之间的共同还有,也是人工去找共同好友的方法,但是放在mapreuce。每次只能读取一行数据不能都到他行的,如果要读到其他行的就要找到一个key然后还要将其他行的数据类聚一起,这样才能读到其他行。
故分为两次mr,第一次是将:后的作为key,:前的作为value,通过reduce聚合,将都关注x的人聚在一起。
第二次则将都关注x的人进行组内两两组合,以此作为key,将x作为value,再经过reduce聚合后,可以得到这个组合内共同都关注的人。
代码实现:
第一次Mapper
package com.lzz.mapreduce.blogfriends;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class FriendsMapper extends Mapper<LongWritable, Text, Text, Text>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
// 1 获取一行 A:B,C,D,F,E,O
String line = value.toString();
// 2 切割
String[] fields = line.split(":");
// 3 获取person和好友
String person = fields[0];
String[] friends = fields[1].split(",");
// 4写出去
for(String friend: friends){
// 输出 <好友,人>
context.write(new Text(friend), new Text(person));
}
}
}
第一次Reducer
package com.lzz.mapreduce.blogfriends;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class FriendsReducer extends Reducer<Text, Text, Text, Text>{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
StringBuffer sb = new StringBuffer();
//1 拼接
for(Text person: values){
sb.append(person).append(",");
}
//2 写出
context.write(key, new Text(sb.toString()));
}
}
第一次Driver
package com.lzz.mapreduce.blogfriends;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class FriendsDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
args=new String[] {"G:/input/blogfriends","g:/output"};
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJarByClass(FriendsDriver.class);
job.setMapperClass(FriendsMapper.class);
job.setReducerClass(FriendsReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result=job.waitForCompletion(true);
System.exit(result?0:1);
}
}
运行结果
A I,K,C,B,G,F,H,O,D,
B A,F,J,E,
C A,E,B,H,F,G,K,
D G,C,K,A,L,F,E,H,
E G,M,L,H,A,F,B,D,
F L,M,D,C,G,A,
G M,
H O,
I O,C,
J O,
K B,
L D,E,
M E,F,
O A,H,I,J,F,
将第一次的运行结果喂入第二次
第二次Mapper
package com.lzz.mapreduce.twoblogfriends;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TwoFriendsMapper extends Mapper<LongWritable, Text, Text, Text>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
String line=value.toString();
String[] words=line.split("\t");
String[] friends=words[1].split(",");
for(int i=0;i<friends.length-1;i++) {
for(int j=i+1;j<friends.length;j++) {
context.write(new Text(friends[i]+"-"+friends[j]), new Text(words[0]));
}
}
}
}
第二次Reducer
package com.lzz.mapreduce.twoblogfriends;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TwoFriendsReducer extends Reducer<Text, Text, Text, Text>{
@Override
protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
StringBuffer sBuffer=new StringBuffer();
for (Text value : values) {
sBuffer.append(value).append(" ");
}
context.write(key, new Text(sBuffer.toString()));
}
}
第二次Driver
package com.lzz.mapreduce.twoblogfriends;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.lzz.mapreduce.blogfriends.FriendsDriver;
import com.lzz.mapreduce.blogfriends.FriendsMapper;
import com.lzz.mapreduce.blogfriends.FriendsReducer;
public class TwoFriendsDriver {
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
args=new String[] {"g:/output","g:/output2"};
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJarByClass(TwoFriendsDriver.class);
job.setMapperClass(TwoFriendsMapper.class);
job.setReducerClass(TwoFriendsReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result=job.waitForCompletion(true);
System.exit(result?0:1);
}
}