MapReduce 获取共同好友分析
MapReduce程序的逆向分析
一、题目
冒号前是一个用户,冒号后是该用户的所有好友(数据中的好友关系是单向的)
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
......
求出哪些人两两之间有共同好友,及他俩的共同好友都有谁?即两个用户之间有哪些共同好友,比如:
A-B:C,E
A-C:D,F
......
二、分析
由底向上分析
假设:某个map reduce结束后可以获得上述答案,可以进一步推测reduce输出格式为:
key:A-B
value:C,E...
......
// ':'可以忽略,"..."代表从格式上来说可能还有其它数据
由reduce对iterator<>values归并的特性,可以分解reduce的输出态key-values为reduce的输入态key-value为:
A-B:C
A-B:E
A-C:D
A-C:F
......
//将拥有C为好友的所有用户切分出来,这些用户是没有顺序的,所以再切分之前需要按照字典序进行排序;否则可能会形成A-B:C,B-A:E这样的数据,导致不能统计两人的共同好友
因为reduce输入状态是map的输出状态,再由map的切分特性可以合并map的输出状态为map的输入状态:
C:A,B...
E:A,B...
D:A,C...
F:A,c...
......
由逻辑意义,可以得出C被A和B共同认为是好友(好友为单向的,即C有可能没有A、B好友),反之得出
A有好友C、E、D、F;B有好友C、E;…由此即可推断出原数据的关系,即冒号后是该用户的所有好友
再增加一个map reduce程序,后一个map的输入是前一个reduce的输出,所以可以根据reduce的归并特性逆向得出reduce的输入态:
C:A
C:B
E:A
E:B
D:A
F:A
......
reduce的输入态即是map的输出态,所以根据map的切分特性,可以得出map的输入态(即原始数据):
A:C,E,D,F,...
B:C,E,...
......
到此由结果数据推导原数据成功,即原数据由以上相应的map reduce转换后即可得到题目要求的答案数据。
三、总结
1由上述情况可以总结map reduce程序的两个特性:
1.1、map程序处理数据总是将一个数据切分,然后组成新的数据;逆操作是切分,然后组成原数据。
1.2、reduce程序处理数据总是将values组合起来,然后结合key进行最终输出;逆操作是分开key-value,然后对value进行切分,再结合key还原原数据。
四、备注:
1、map处理数据有可能使用InputFormat直接进行预处理,而自身不再进行切分操作,只是单纯传输。
2、reduce的key可能被舍弃;如果出现value被舍弃的情况,即没有组合values和shuffle操作,为避免reduce及shuffle过程带来的大量资源消耗,建议尽量不进行reduce操作。
五、代码:
1、IsFriendMapper.java:
/**
* description: a mapper program of mutual friend;A have friends,a friend is A`s friend
* author: bob yy
* since: 1.8
**/
public class IsFriendMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
/**
* outKey:outValue`s friend; outValue:a user
*
* @param key number of row
* @param value a row data
* @param context context of InputFormat
* @throws IOException IOException
* @throws InterruptedException InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] keyValue = value.toString().split(":");
outValue.set(keyValue[0]);
for (String user : keyValue[1].split(",")) {
outKey.set(user);
context.write(outKey, outValue);
}
}
}
2、IsFriendReducer.java:
/**
* description: a reducer program of mutual friend,merge user
* author: bob yy
* since: 1.8
**/
public class IsFriendReducer extends Reducer<Text, Text, Text, Text> {
private Text outValue = new Text();
/**
* outKey: outValues`s friend; outValues: a group user for merge users
*
* @param key outKey
* @param values users
* @param context InputFormat context
* @throws IOException IOException
* @throws InterruptedException InterruptedException
*/
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
boolean isFirst = true;
for (Text value : values) {
if (!isFirst) {
sb.append("," + value.toString());
} else {
isFirst = false;
sb.append(value.toString());
}
}
outValue.set(sb.toString());
context.write(key, outValue);
}
}
3、MutualFriendMapper.java:
/**
* description: a mapper program for mutual friend,get mutual friend from between two users
* author: bob yy
* since: 1.8
**/
public class MutualFriendMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text outValue = new Text();
private Text outKey = new Text();
/**
* outKey: two users; outValue: a friend of between two users
*
* @param key line number
* @param value a friend and a text for users with ':' connect
* @param context InputFormat context
* @throws IOException IOException
* @throws InterruptedException InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] split = value.toString().split("\t");
outValue.set(split[0]);
String[] users = split[1].split(",");
// users need sorted
Arrays.sort(users);
// get tow users for bubbling
for (int i = 0; i < users.length; i++) {
for (int j = i + 1; j < users.length; j++) {
outKey.set(users[i] + "-" + users[j]);
context.write(outKey, outValue);
}
}
}
}
4、MutualFriendReducer.java:
/**
* description: a reducer program of mutual friend, merge mutual friends between two user
* author: bob yy
* since: 1.8
**/
public class MutualFriendReducer extends Reducer<Text,Text,Text,Text> {
private Text outValue = new Text();
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
boolean isFirst = true;
for (Text value : values) {
if(!isFirst){
sb.append(","+value.toString());
}else{
isFirst = false;
sb.append(value.toString());
}
}
outValue.set(sb.toString());
context.write(key, outValue);
}
}
5、MutualFriendDriver.java:
/**
* description: a driver program of mutual friend
* author: bob yy
* since: 1.8
**/
public class MutualFriendDriver {
public static void main(String[] args) throws IOException {
Path inputPath = new Path("J:\\data\\friends\\input");
Path outputPath1 = new Path("J:\\data\\friends\\output1");
Path outputPath2 = new Path("J:\\data\\friends\\output2");
FileSystem fs = SimpleFileSystem.getLocalFileSystem();
if (fs.exists(outputPath1)) {
fs.delete(outputPath1, true);
}
if (fs.exists(outputPath2)) {
fs.delete(outputPath2, true);
}
Job job1 = Job.getInstance();
Job job2 = Job.getInstance();
/********************************/
// set job1
job1.setMapperClass(IsFriendMapper.class);
job1.setReducerClass(IsFriendReducer.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job1, inputPath);
FileOutputFormat.setOutputPath(job1, outputPath1);
/********************************/
// set job2
job2.setMapperClass(MutualFriendMapper.class);
job2.setReducerClass(MutualFriendReducer.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job2, outputPath1);
FileOutputFormat.setOutputPath(job2, outputPath2);
/********************************/
job1.setJobName("is friend");
job2.setJobName("mutual friend");
// get job control
JobControl jobControl = new JobControl("mutual friend");
ControlledJob controlledJob1 = new ControlledJob(job1.getConfiguration());
ControlledJob controlledJob2 = new ControlledJob(job2.getConfiguration());
// connect job for control job
controlledJob2.addDependingJob(controlledJob1);
jobControl.addJob(controlledJob1);
jobControl.addJob(controlledJob2);
// set daemon
Thread jobControlThread = new Thread(jobControl);
jobControlThread.setDaemon(true);
jobControlThread.start();
// wait for completion
while (true) {
if (jobControl.allFinished()) {
System.out.println(jobControl.getSuccessfulJobList());
return;
}
}
}
}
以上分析的普适性未经证明,仅供参考