一、需求
现在有一些订单的评论数据,需要将订单的好评与其他评论(中评、差评)进行区分开来,将最终的数据分开到不同的文件夹下面去,数据内容如下图,其中数据第九个字段表示好评,中评,差评。0:好评,1:中评,2:差评
二、分析
关键点是要在一个mapreduce程序中根据数据的不同,输出两类结果到不同目录,这类灵活的输出需求可以通过自定义outputformat来实现
三、实现
实现要点:
1、在mapreduce中访问外部资源
2、自定义outputformat
,改写其中的recordwriter
,改写具体输出数据的方法write()
第一步:自定义一个outputformat
public class MyOutPutFormat extends FileOutputFormat<Text, NullWritable> {
@Override
public RecordWriter getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
FileSystem fileSystem = FileSystem.get(taskAttemptContext.getConfiguration());
Path goodComment = new Path("file:///E:\\大数据\\数据\\goodComment\\goodComment.txt");
Path badComment = new Path("file:///E:\\大数据\\数据\\badComment\\badComment.txt");
FSDataOutputStream goodOutputStream = fileSystem.create(goodComment);
FSDataOutputStream badOutputStream = fileSystem.create(badComment);
return new MyRecordWriter(goodOutputStream,badOutputStream);
}
public static class MyRecordWriter extends RecordWriter<Text, NullWritable> {
FSDataOutputStream goodStream = null;
FSDataOutputStream badStream = null;
public MyRecordWriter(FSDataOutputStream goodOutputStream, FSDataOutputStream badOutputStream) {
this.goodStream = goodOutputStream;
this.badStream = badOutputStream;
}
@Override
public void write(Text text, NullWritable nullWritable) throws IOException, InterruptedException {
if (text.toString().split("\t")[9].equals("0")){ //注意这里的0是string类型
goodStream.write(text.toString().getBytes());
goodStream.write("\r\n".getBytes());
}else{
badStream.write(text.toString().getBytes());
badStream.write("\r\n".getBytes());
}
}
@Override
public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
if (goodStream != null){
goodStream.close();
}
if (badStream != null){
badStream.close();
}
}
}
}
第二步:开发mapreduce处理流程
public class MyOwnOutputFormatMain extends Configured implements Tool {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
int run = ToolRunner.run(configuration, new MyOwnOutputFormatMain(), args);
System.exit(run);
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = super.getConf();
Job job = Job.getInstance(conf, MyOwnOutputFormatMain.class.getSimpleName());
job.setJarByClass(MyOwnOutputFormatMain.class);
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job,new Path("file:///E:\\大数据\\数据"));
job.setMapperClass(MyOwnMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputFormatClass(MyOutPutFormat.class);
//设置一个输出目录,这个目录会输出一个success的成功标志的文件,注意是标志文件,输出文件路径在outpuformat中
MyOutPutFormat.setOutputPath(job,new Path("file:///E:\\大数据\\数据\\out"));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
boolean b = job.waitForCompletion(true);
return b?0:1;
}
public static class MyOwnMapper extends Mapper<LongWritable, Text,Text, NullWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// String[] fields = value.toString().split("\t");
// String commentStatus = fields[9];
context.write(value,NullWritable.get());
}
}
}