MapReduce作业2

这篇博客详细探讨了MapReduce在大数据处理中的应用,包括统计部门职位工资总和、MapReduce的核心思想、分区器的工作时机、奇偶数统计、评分电影Top10、文件内容合并以及输入分片的理解。通过这些问题的解答,深入解析MapReduce的工作原理和最佳实践。
摘要由CSDN通过智能技术生成

数据如下:

员工号,姓名,职位  领导编号  入职日期 月工资,月奖金 部门编号
7369,SMITH,CLERK,7902,1980-12-17,800,null,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02,2975,null,20
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30
7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10
7788,SCOTT,ANALYST,7566,1987-04-19,3000,null,20
7839,KING,PRESIDENT,null,1981-11-17,5000,null,10
7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23,1100,null,20
7900,JAMES,CLERK,7698,1981-12-03,950,null,30
7902,FORD,ANALYST,7566,1981-12-02,3000,null,20
7934,MILLER,CLERK,7782,1982-01-23,1300,null,10

第零题:问题描述

统计每个部门每种职位的工资之和。

sql语句写法:

SELECT deptno,job,SUM(sal) FROM emp GROUP BY deptno,job;

使用mr程序完成上述需求,需满足以下两个条件:

1、将10和20部门划分为一个分区,其他部门划分为一个分区
2、k2需要使用由部门和职位定义的一个新的hadoop类型:DeptJob

代码粘贴处:

package mr.emp;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class EmpDriver {
   
    static class DeptJob implements WritableComparable<DeptJob> {
   
        private int id;
        private String worker;

        public DeptJob() {
   
        }

        public DeptJob(int id, String worker) {
   
            this.id = id;
            this.worker = worker;
        }

        public int getId() {
   
            return id;
        }

        public void setId(int id) {
   
            this.id = id;
        }

        public String getWorker() {
   
            return worker;
        }

        public void setWorker(String worker) {
   
            this.worker = worker;
        }

        @Override
        public String toString() {
   
            return id + "\t" + worker;
        }

        //序列化,将对象的字段信息写入输出流
        @Override
        public void write(DataOutput dataOutput) throws IOException {
   
            dataOutput.writeInt(id);
            dataOutput.writeUTF(worker);
        }

        //反序列化,从输入流读取各字段的信息
        @Override
        public void readFields(DataInput dataInput) throws IOException {
   
            id = dataInput.readInt();
            worker = dataInput.readUTF();
        }

        @Override
        public int compareTo(DeptJob o) {
   
            if (id != o.id)
                return id - o.id;
            else return worker.compareTo(o.worker);
        }
    }

    static class EmpMapper extends Mapper<LongWritable, Text, DeptJob, IntWritable> {
   
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
   
            String line = value.toString();
            String[] strings = line.split(",");

            //获取编号、职位
            DeptJob k2 = new DeptJob(Integer.parseInt(strings[7]), strings[2]);
            System.out.println(k2);
            IntWritable v2 = new IntWritable(Integer.parseInt(strings[5]));

            context.write(k2, v2);
        }
    }

    static class EmpReduce extends Reducer<DeptJob, IntWritable, Text, Text> {
   
        Text k3 = new Text();
        Text v3 = new Text();

        @Override
        protected void reduce(DeptJob key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
   
            int sum = 0;
            for (IntWritable value : values) {
   
                sum += value.get();
            }
            k3.set(key.toString());
            v3.set(String.valueOf(sum));
            context.write(k3, v3);
        }
    }

    static class EmpPartitioner extends Partitioner<DeptJob, IntWritable> {
   

        @Override
        public int getPartition(DeptJob deptJob, IntWritable intWritable, int i) {
   
            if (deptJob.getId() == 10 || deptJob.getId() == 20) {
   
                return 0;
            } else {
   
                return 1;
            }
        }
    }

    public static void main(String[] args) throws Exception {
   
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        //驱动
        job.setJarByClass(EmpDriver.class);

        //设置mapper和reducer类型
        job.setMapperClass(EmpMapper.class);
        job.setReducerClass(EmpReduce.class);

        //设置k2,v2,k3,v3的泛型
        job.setMapOutputKeyClass(DeptJob.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值