mapreduce的自定义分组器

Mapreduce自定义分组器

前提:有的时候我们想将符合条件的key值放在同一个组内;但是key的值是不同的将不会放进同一个组中。

举例:想将一个学生的进校以后不同时间段的数学成绩按进校考试的时间进行一个成绩排序。如下效果

//排序前的效果
 stu1 time1 core1
 stu1 time2 core
 stu1 time3 core3
 stu2 time1 core1
 stu2 time2 core3
 stu2 time3 core2
//排序后的效果
 stu1 core3,core2,core1
 stu2 core2,core3,core1

方法:

  • 说明:提前就已经重新定义了数据类型,类型定义如下:
/**
 * @description 这是自己的key值
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 14:49:55
 **/
public class OneKey implements WritableComparable {
    private String first;
    private String secend;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        OneKey myKey = (OneKey) o;
        return Objects.equals(first, myKey.first) &&
                Objects.equals(secend, myKey.secend);
    }

    @Override
    public int hashCode() {
        return Objects.hash(first, secend);
    }

    public OneKey() {
        super();
    }

    public OneKey(String first, String secend) {
        this.first = first;
        this.secend = secend;
    }

    public void setFirst(String first) {
        this.first = first;
    }

    public void setSecend(String secend) {
        this.secend = secend;
    }

    public String getFirst() {
        return first;
    }

    public String getSecend() {
        return secend;
    }

    @Override
    public int compareTo(Object o) {//如果学生不同按学号先后排序,如果是同一个学生的话就将按考试的时间倒序来排。
        OneKey o1 = (OneKey)o;
        int ans = this.first.compareTo(o1.getFirst());
        if(ans != 0){
            return ans;
        }
        return o1.getSecend().compareTo(this.secend);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(this.first);
        out.writeUTF(this.secend);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.first = in.readUTF();
        this.secend = in.readUTF();
    }
}
  • 首先:要重新定义一下分组的方法,代码如下:
/**
 * @description: 这个是用于将重新定义分组的key
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 16:10:18
 **/
public class OneComparator extends WritableComparator {
    public OneComparator() {//将要重新定义的分组的key的类型传进来
        super(OneKey.class,true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {//对应这道题来说就是只是按照这个组合key的学生的学号来分组
        OneKey a1 = (OneKey)a;
        OneKey b1 = (OneKey) b;
        return a1.getFirst().compareTo(b1.getFirst());
    }
}
  • 编写mapper:
import java.io.IOException;
/**
 * @description
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 15:02:08
 **/
public class OneMapper extends Mapper<LongWritable, Text, OneKey,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String lines = value.toString();
        String[] strings = lines.split(",");
        OneKey oneKey = new OneKey();
        oneKey.setFirst(strings[0]);
        oneKey.setSecend(strings[1]);
        context.write(oneKey, new Text(strings[2]));
    }
}
  • 编写reducer:
/**
 * @description
 * @author: LuoDeSong 694118297@qq.com
 * @create: 2019-06-18 15:19:31
 **/
public class OneReducer extends Reducer<OneKey, Text, Text, Text> {
    @Override
    protected void reduce(OneKey key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuilder sb=new StringBuilder();
        for (Text text: values) {
            sb.append(text.toString()).append(",");
        }
        context.write(new Text(key.getFirst()), new Text(sb.substring(0, sb.length()-1)));
    }
}
  • 编写driver:
public class OneDriver {
    /**
     *
     * @param args
     * @throws IOException
     * @throws ClassNotFoundException
     * @throws InterruptedException
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        String inputPath = "E:/Test-workspace/test/input/2-1.txt";
        String outputPath = "E:/Test-workspace/test/output/2-1";
        Configuration configuration = new Configuration();
        Job instance = Job.getInstance(configuration);
        instance.setJarByClass(OneDriver.class);
        //设置自定分组
        instance.setGroupingComparatorClass(OneComparator.class);
        //instance.setNumReduceTasks(2);这是设置reduce的个数

        instance.setMapperClass(OneMapper.class);
        instance.setReducerClass(OneReducer.class);

        instance.setMapOutputKeyClass(OneKey.class);
        instance.setMapOutputValueClass(Text.class);

        instance.setOutputKeyClass(Text.class);
        instance.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(instance, inputPath);
        FileOutputFormat.setOutputPath(instance, new Path(outputPath));

        boolean waitForCompletion = instance.waitForCompletion(true);
        System.exit(waitForCompletion?0:1);
    }
}
发布了97 篇原创文章 · 获赞 112 · 访问量 2万+
展开阅读全文

Hadoop自定义分组和多ReductTask出现异常

12-04

我现在有三个节点 程序在windows下编写,并将Job提交到了集群的Yarn上去执行,出现异常.但是在Linux下使用Hadoop jar 执行是可以的.之前在执行WordCount和其他小程序时候, 并没有出错,我认为错误原因在于这个ReductTask.请大牛指导一下.万分感谢.. ``` 2015-12-04 15:33:43,100 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager at hadoop01/10.5.110.250:8032 2015-12-04 15:33:43,458 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2015-12-04 15:33:43,478 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2015-12-04 15:33:43,525 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(280)) - Total input paths to process : 1 2015-12-04 15:33:43,573 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1 2015-12-04 15:33:43,655 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_1449213919153_0002 2015-12-04 15:33:43,744 INFO [main] mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(369)) - Job jar is not present. Not adding any jar to the list of resources. 2015-12-04 15:33:43,778 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(204)) - Submitted application application_1449213919153_0002 2015-12-04 15:33:43,807 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://hadoop01:8088/proxy/application_1449213919153_0002/ 2015-12-04 15:33:43,808 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_1449213919153_0002 2015-12-04 15:33:46,823 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_1449213919153_0002 running in uber mode : false 2015-12-04 15:33:46,825 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0% 2015-12-04 15:33:46,833 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1449213919153_0002 failed with state FAILED due to: Application application_1449213919153_0002 failed 2 times due to AM Container for appattempt_1449213919153_0002_000002 exited with exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/lixiwei/.staging/job_1449213919153_0002/job.splitmetainfo does not exist .Failing this attempt.. Failing the application. 2015-12-04 15:33:46,861 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0 ``` 程序如下: ``` public class FlowSumArea { public static class FlowSumAreaMapper extends Mapper<LongWritable, Text, Text, FlowBean> { @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, FlowBean>.Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = StringUtils.split(line, "\t"); String phoneNo = fields[1]; long upFlow = Long.parseLong(fields[7]); long downFLow = Long.parseLong(fields[8]); context.write(new Text(phoneNo), new FlowBean(phoneNo, upFlow, downFLow)); } } public static class FlowSumAreaReducer extends Reducer<Text, FlowBean, Text, FlowBean> { @Override protected void reduce(Text key, Iterable<FlowBean> values, Reducer<Text, FlowBean, Text, FlowBean>.Context context) throws IOException, InterruptedException { long upFlowCounter = 0; long downFlowCounter = 0; for (FlowBean bean : values) { upFlowCounter += bean.getUpFlow(); downFlowCounter += bean.getDownFlow(); } context.write(key, new FlowBean(key.toString(), upFlowCounter, downFlowCounter)); } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { // 1.获取配置文件 Configuration conf = new Configuration(); // 2.设置Job Job job = Job.getInstance(); job.setJarByClass(FlowSumArea.class); job.setMapperClass(FlowSumAreaMapper.class); job.setReducerClass(FlowSumAreaReducer.class); job.setPartitionerClass(AreaPartitioner.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(FlowBean.class); // 设置Reduce的任务并发数,应该跟分组的数量保持一致 job.setNumReduceTasks(6); // 3.设置输入输出路径 FileInputFormat.setInputPaths(job, new Path("C:\\Users\\51195\\Desktop\\flow\\flowarea\\srcdata")); FileOutputFormat.setOutputPath(job, new Path("C:\\Users\\51195\\Desktop\\flow\\flowarea\\outputdata6")); // FileInputFormat.setInputPaths(job, new Path(args[0])); // FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true)?0:1); } } ``` 这个是分组程序 ``` public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE>{ private static HashMap<String,Integer> areaMap = new HashMap<>(); static{ areaMap.put("135", 0); areaMap.put("136", 1); areaMap.put("137", 2); areaMap.put("138", 3); areaMap.put("139", 4); } @Override public int getPartition(KEY key, VALUE value, int numPartitions) { //从key中拿到手机号,查询手机归属地字典,不同的省份返回不同的组号 int areaCoder = areaMap.get(key.toString().substring(0, 3))==null?5:areaMap.get(key.toString().substring(0, 3)); return areaCoder; } } ``` 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 技术黑板 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览