MapReduce实战之过滤日志及自定义日志输出路径案例（自定义OutputFormat）

最新推荐文章于 2024-05-16 17:17:19 发布

不稳定记忆

最新推荐文章于 2024-05-16 17:17:19 发布

阅读量294

点赞数

分类专栏： Hadoop 文章标签：过滤日志及自定义日志输出路径案例（自定义OutputForm

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/Faded1573606285/article/details/100674499

版权

Hadoop 专栏收录该内容

43 篇文章 0 订阅

订阅专栏

1）需求

过滤输入的log日志中是否包含atguigu

（1）包含atguigu的网站输出到e:/atguigu.log

（2）不包含atguigu的网站输出到e:/other.log

2）输入数据

http://www.baidu.com
http://www.google.com
http://cn.bing.com
http://www.atguigu.com
http://www.sohu.com
http://www.sina.com
http://www.sin2a.com
http://www.sin2desa.com
http://www.sindsafa.com

输出预期：

1:http://www.atguigu.com

2:

http://cn.bing.com
http://www.baidu.com
http://www.google.com
http://www.sin2a.com
http://www.sin2desa.com
http://www.sina.com
http://www.sindsafa.com
http://www.sohu.com

3）具体程序：

（1）自定义一个outputformat

package com.atguigu.mapreduce.outputformat;

import java.io.IOException;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.RecordWriter;

import org.apache.hadoop.mapreduce.TaskAttemptContext;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FilterOutputFormat extends FileOutputFormat<Text, NullWritable>{

@Override

public RecordWriter<Text, NullWritable> getRecordWriter(TaskAttemptContext job)

throws IOException, InterruptedException {

// 创建一个RecordWriter

return new FilterRecordWriter(job);

}

}

（2）具体的写数据RecordWriter

package com.atguigu.mapreduce.outputformat;

import java.io.IOException;

import org.apache.hadoop.fs.FSDataOutputStream;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.RecordWriter;

import org.apache.hadoop.mapreduce.TaskAttemptContext;

public class FilterRecordWriter extends RecordWriter<Text, NullWritable> {

FSDataOutputStream atguiguOut = null;

FSDataOutputStream otherOut = null;

public FilterRecordWriter(TaskAttemptContext job) {

// 1 获取文件系统

FileSystem fs;

try {

fs = FileSystem.get(job.getConfiguration());

// 2 创建输出文件路径

Path atguiguPath = new Path("e:/atguigu.log");

Path otherPath = new Path("e:/other.log");

// 3 创建输出流

atguiguOut = fs.create(atguiguPath);

otherOut = fs.create(otherPath);

} catch (IOException e) {

e.printStackTrace();

}

}

@Override

public void write(Text key, NullWritable value) throws IOException, InterruptedException {

// 判断是否包含“atguigu”输出到不同文件

if (key.toString().contains("atguigu")) {

atguiguOut.write(key.toString().getBytes());

} else {

otherOut.write(key.toString().getBytes());

}

}

@Override

public void close(TaskAttemptContext context) throws IOException, InterruptedException {

// 关闭资源

if (atguiguOut != null) {

atguiguOut.close();

}

if (otherOut != null) {

otherOut.close();

}

}

}

（3）编写FilterMapper

package com.atguigu.mapreduce.outputformat;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class FilterMapper extends Mapper<LongWritable, Text, Text, NullWritable>{

Text k = new Text();

@Override

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

// 1 获取一行

String line = value.toString();

k.set(line);

// 3 写出

context.write(k, NullWritable.get());

}

}

（4）编写FilterReducer

package com.atguigu.mapreduce.outputformat;

import java.io.IOException;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class FilterReducer extends Reducer<Text, NullWritable, Text, NullWritable> {

@Override

protected void reduce(Text key, Iterable<NullWritable> values, Context context)

throws IOException, InterruptedException {

String k = key.toString();

k = k + "\r\n";

context.write(new Text(k), NullWritable.get());

}

}

（5）编写FilterDriver

package com.atguigu.mapreduce.outputformat;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FilterDriver {

public static void main(String[] args) throws Exception {

args = new String[] { "e:/input/inputoutputformat", "e:/output2" };

Configuration conf = new Configuration();

Job job = Job.getInstance(conf);

job.setJarByClass(FilterDriver.class);

job.setMapperClass(FilterMapper.class);

job.setReducerClass(FilterReducer.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(NullWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(NullWritable.class);

// 要将自定义的输出格式组件设置到job中

job.setOutputFormatClass(FilterOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));

// 虽然我们自定义了outputformat，但是因为我们的outputformat继承自fileoutputformat

// 而fileoutputformat要输出一个_SUCCESS文件，所以，在这还得指定一个输出目录

FileOutputFormat.setOutputPath(job, new Path(args[1]));

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

}

不稳定记忆

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MapReduce实战之过滤日志及自定义日志输出路径案例（自定义OutputFormat）

1）需求过滤输入的log日志中是否包含atguigu （1）包含atguigu的网站输出到e:/atguigu.log （2）不包含atguigu的网站输出到e:/other.log2）输入数据http://www.baidu.comhttp://www.google.comhttp://cn.bing.comhttp://www.a...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。