hadoop 之 MultipleInputs

hadoop 之 MultipleInputs--为多个输入指定不同的InputFormat和Mapper

MultipleInputs 介绍

默认情况下,MapReduce作业的输入可以包含多个输入文件,但是所有的文件都由同一个InputFormat 和 同一个Mapper 来处理,这是的多个文件应该是格式相同,内容可以使用同一个Mapper处理。

但是,有可能这多个文件的数据格式不同,这是使用同一个Mapper来处理就显得不合适了。

对于上述问题,MultipleInputs可以妥善处理,他允许对每条输入路径指定InputFormat和Mapper。

对于Reducer来说,是聚合后的map输出,并不知道是由不同的mapper产生的。

实例

1.要处理的文件:

  • trade_info1.txt
zhangsan@163.com    6000    0   2014-02-20
lisi@163.com    2000    0   2014-02-20
lisi@163.com    0   100 2014-02-20
zhangsan@163.com    3000    0   2014-02-20
wangwu@126.com  9000    0   2014-02-20
wangwu@126.com  0   200     2014-02-20
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • trade_info.txt
zhangsan@163.com,6000,0,2014-02-20
lisi@163.com,2000,0,2014-02-20
lisi@163.com,0,100,2014-02-20
zhangsan@163.com,3000,0,2014-02-20
wangwu@126.com,9000,0,2014-02-20
wangwu@126.com,0,200,2014-02-20
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

2.代码:
处理多个不同输入的重要代码

MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, SumStepByToolMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, SumStepByToolWithCommaMapper.class);
  
  
  • 1
  • 2

两个不同的Mapper在针对每行记录时,使用了不同的分隔符将记录分成不同的内容,这是两个Mapper唯一的不同。

    String line = value.toString();
    String[] fields = line.split("\t");
  
  
  • 1
  • 2
    String line = value.toString();
    String[] fields = line.split(",");
  
  
  • 1
  • 2

package mapreduce.mr;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import mapreduce.bean.InfoBeanMy;

public class SumStepByTool extends Configured implements Tool{

    public static class SumStepByToolMapper extends Mapper<LongWritable, Text, Text, InfoBeanMy>{

        private InfoBeanMy outBean = new InfoBeanMy();
        private Text k = new Text();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

            String line = value.toString();
            String[] fields = line.split("\t");

            String account = fields[0];
            double income = Double.parseDouble(fields[1]);
            double expense = Double.parseDouble(fields[2]);

            outBean.setFields(account, income, expense);
            k.set(account);

            context.write(k, outBean);
        }
    }

    public static class SumStepByToolWithCommaMapper extends Mapper<LongWritable, Text, Text, InfoBeanMy>{

            private InfoBeanMy outBean = new InfoBeanMy();
            private Text k = new Text();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

                String line = value.toString();
                String[] fields = line.split(",");

                String account = fields[0];
                double income = Double.parseDouble(fields[1]);
                double expense = Double.parseDouble(fields[2]);

                outBean.setFields(account, income, expense);
                k.set(account);

                context.write(k, outBean);
            }
        }

    public static class SumStepByToolReducer extends Reducer<Text, InfoBeanMy, Text, InfoBeanMy>{

        private InfoBeanMy outBean = new InfoBeanMy();
        @Override
        protected void reduce(Text key, Iterable<InfoBeanMy> values, Context context) throws IOException, InterruptedException{
            double income_sum = 0;
            double expense_sum = 0;

            for(InfoBeanMy infoBeanMy : values)
            {
                income_sum += infoBeanMy.getIncome();
                expense_sum += infoBeanMy.getExpense();
            }
            outBean.setFields("", income_sum, expense_sum);
            context.write(key, outBean);
        }

    }


    public static class SumStepByToolPartitioner extends Partitioner<Text, InfoBeanMy>{

        private static Map<String, Integer> accountMap = new HashMap<String, Integer>(); 

        static {
            accountMap.put("zhangsan", 1);
            accountMap.put("lisi", 2);
            accountMap.put("wangwu", 3);
        }

        @Override
        public int getPartition(Text key, InfoBeanMy value, int numPartitions) {
            String keyString = key.toString();
            String name = keyString.substring(0, keyString.indexOf("@"));
            Integer part = accountMap.get(name);
            if (part == null )
            {
                part = 0;
            }
            return part;
        }

    }

    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.setInt("mapreduce.input.lineinputformat.linespermap", 2);
        Job job = Job.getInstance(conf);
        job.setJarByClass(this.getClass());
        job.setJobName("SumStepByTool");

        //job.setInputFormatClass(TextInputFormat.class); //这个是默认的输入格式
        //job.setInputFormatClass(KeyValueTextInputFormat.class); //这个把一行记录的第一个区域当做key,其他区域作为value
        //job.setInputFormatClass(NLineInputFormat.class);

//      job.setMapperClass(SumStepByToolMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(InfoBeanMy.class);

        job.setReducerClass(SumStepByToolReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(InfoBeanMy.class);
        job.setNumReduceTasks(3);

        MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, SumStepByToolMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, SumStepByToolWithCommaMapper.class);
//      FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));


        return job.waitForCompletion(true) ? 0:-1;
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new SumStepByTool(),args);
        System.exit(exitCode);
    }
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148

运行时:
有三个参数,前两个为输入路径,最后一个为输出路径,

[root@hadoop1 tmp]# hadoop jar sortscore.jar mapreduce.mr.SumStepByTool /tradeinfoIn/trade_info1.txt /tradeinfoIn/trade_info.txt /tradeinfoOut/
  
  
  • 1

注意

  • 没有使用MultipleInputs时,是使用FileInputFormat来指定输入路径的,时候后,MultipleInputs替代了其工作,但是仍用FileOutputFormat指定输出路径;
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值