在Maprecue中利用MultipleOutputs输出多个文件

本文介绍了在MapReduce中如何利用MultipleOutputs类将不同键值对输出到用户自定义的文件中,通过指定输出文件的命名前缀,如根据日期将数据输出到相应日期命名的文件,以此实现不同数据的分类存储。测试数据来源于ip-to-hosts.txt。
摘要由CSDN通过智能技术生成

用户在使用Mapreduce时默认以part-*命名,

MultipleOutputs可以将不同的键值对输出到用户自定义的不同的文件中。

实现过程是在调用output.write(key, new IntWritable(total), key.toString());

方法时候第三个参数是  public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) 指定了输出文件的命名前缀,那么我们可以通过对不同的key使用不同的baseOutputPath来使不同key对应的value输出到不同的文件中,比如将同一天的数据输出到以该日期命名的文件中

测试数据:ip-to-hosts.txt

18.217.167.70	United States
206.96.54.107	United States
196.109.151.139	Mauritius
174.52.58.113	United States
142.111.216.8	Canada
162.100.49.185	United States
146.38.26.54	United States
36.35.107.36	China
95.214.95.13	Spain
2.96.191.111	United Kingdom
62.177.119.177	Czech Republic
21.165.189.3	United States
46.190.32.115	Greece
113.173.113.29	Vietnam
42.65.172.142	Taiwan
197.91.198.199	South Africa
68.165.71.27	United States
110.119.165.104	China
171.50.76.89	India
171.207.52.113	Singapore
40.174.30.170	United States
191.170.95.175	United States
17.81.129.101	United States
91.212.157.202	France
173.83.82.99	United States
129.75.56.220	United States
149.25.104.198	United States
103.110.22.19	Indonesia
204.188.117.122	United States
138.23.10.72	United States
172.50.15.32	United States
85.88.38.58	Belgium
49.15.14.6	India
19.84.175.5	United States
50.158.140.215	United States
161.114.120.34	United States
118.211.174.52	Australia
220.98.113.71	Japan
182.101.16.171	China
25.45.75.194	United Kingdom
168.16.162.99	United States
155.60.219.154	Australia
26.216.17.198	United States
68.34.157.157	United States
89.176.196.28	Czech Republic
173.11.51.134	United States
116.207.191.159	China
164.210.124.152	United States
168.17.158.38	United States
174.24.173.11	United States
143.64.173.176	United States
160.164.158.125	Italy
15.111.128.4	United States
22.71.176.163	United States
105.57.100.182	Morocco
111.147.83.42	China
137.157.65.89	Australia
该文件中每行数据有两个字段 分别是ip地址和该ip地址对应的国家,以\t分隔


上代码

 public static class IPCountryReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {

        private MultipleOutputs output;

        @Override
        protected void setup(Context context
        ) throws IOException, InterruptedException {
            output = new MultipleOutputs(context);
        }


        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context
        ) throws IOException, InterruptedException {
            int total = 0;
            for(IntWritable value: values) {
                total += value.get();
            }
           <span style="color:#FF0000;"> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString());
            output.write(key, new IntWritable(total), key.toString());</span>

        }

        @Override
        protected void cleanup(Context context
        ) throws IOException, InterruptedException {
            output.close();
        }
    }
在reduce的setup方法中
 output = new MultipleOutputs(context);
然后在reduce中通过该output将内容输出到不同的文件中
   private Configuration conf;
    public static final String NAME = "named_output";


    public static void main(String[] args) throws Exception {
        args =new String[] {"hdfs://caozw:9100/user/hadoop/hadooprealword","hdfs://caozw:9100/user/hadoop/hadooprealword/output"};
        ToolRunner.run(new Configuration(), new NamedCountryOutputJob(), args);
    }

    public int run(String[] args) throws Exception {
        if(args.length != 2) {
            System.err.println("Usage: named_output <input> <output>");
            System.exit(1);
        }

        Job job = new Job(conf, "IP count by country to named files");
        job.setInputFormatClass(TextInputFormat.class);

        job.setMapperClass(IPCountryMapper.class);
        job.setReducerClass(IPCountryReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setJarByClass(NamedCountryOutputJob.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        return job.waitForCompletion(true) ? 1 : 0;

    }

    public void setConf(Configuration conf) {
        this.conf = conf;
    }

    public Configuration getConf() {
        return conf;
    }

    public static class IPCountryMapper
            extends Mapper<LongWritable, Text, Text, IntWritable> {

        private static final int country_pos = 1;
        private static final Pattern pattern = Pattern.compile("\\t");

        @Override
        protected void map(LongWritable key, Text value,
                           Context context) throws IOException, InterruptedException {
            String country = pattern.split(value.toString())[country_pos];
            context.write(new Text(country), new IntWritable(1));
        }
    }

测试结果:


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值