MapReduce练习(二)
1、启动hadoop-1.2.1 集群:
Master:
Slave:
2、任务要求:
有一批电话通信清单,记录了用户A拨打给用户B的记录
做一个倒排索引,记录拨打给用户B所有用户A
3、要处理的数据传到hdfs上:
4、MapReduce代码:
导入的包:
记错行数::
Map方法:
Redecu方法:
Run方法:
主方法:
5、处理的数据结果:
6、处理过程日志:
7、关闭集群:
8、附部分代码和结果:
Map函数:
public static class Map extendsMapper<LongWritable,Text,Text,Text>{
publicvoid map(LongWritable key,Text value,Context context) throwsIOException,InterruptedException{
Stringline = value.toString();
try{
String[] lineSplit=line.split(" ");
Stringanum=lineSplit[0];
Stringbnum=lineSplit[1];
context.write(newText(bnum),new Text(anum));
}catch(Exceptione){
context.getCounter(Counter.LINESKIP).increment(1);
return;
}
}
}
Reduce函数:
public static class Reduce extendsReducer<Text,Text,Text,Text>{
publicvoid reduce(Text key,Iterable<Text> values,Context context) throwsIOException,InterruptedException{
StringvalueString;
Stringout="";
for(Textvalue:values){
valueString=value.toString();
out+=valueString+"|";
}
context.write(key,newText(out));
}
}
Run方法:
public int run(String[] args) throwsException {
Configurationconf=getConf();
Jobjob=new Job(conf,"xiaobaozi");
job.setJarByClass(xiaobaozi.class);
FileInputFormat.addInputPath(job,newPath(args[0]));
FileOutputFormat.setOutputPath(job,newPath(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.waitForCompletion(true);
System.out.println("任务名称:"+job.getJobName());
System.out.println("任务成功:"+(job.isSuccessful()?"是":"否"));
System.out.println("跳过的行"+job.getCounters().findCounter(Counter.LINESKIP).getValue());
returnjob.isSuccessful()?0:1;
}
enum Counter{
LINESKIP, //出错的行
}
public static void main(String args[]) throws Exception{
//在main函数调用run方法,启动一个mr任务
intres=ToolRunner.run(new Configuration(), new xiaobaozi(),args);
System.exit(res);
}
10086 13614033692|13614033692|17702449852|
110 13614033692|17702449852|
119 18004024063|17702449852|
120 18004024063|13614033692|
17702449852 18004024063|