Maven工程的MapReduce程序3---实现统计各部门员工薪水总和功能(优化)

最新推荐文章于 2024-04-02 17:40:29 发布

Hadoop_Liang

最新推荐文章于 2024-04-02 17:40:29 发布

阅读量7.1k

点赞数 2

分类专栏： Hadoop 文章标签： MapReduce Hadoop Maven工程

本文链接：https://blog.csdn.net/qq_42881421/article/details/84133800

版权

Hadoop 专栏收录该内容

34 篇文章 22 订阅

订阅专栏

本文在实现统计各部门员工薪水总和功能的基础上进行，还没实现的话请参考：实现统计各部门员工薪水总和功能

优化项目：

1.使用序列化

2.实现分区Patitioner

3.Map使用Combiner

使用序列化

本案例是在实现统计各部门员工薪水总和功能基础上进行。

序列化与反序列化：

序列化是指将Java对象转换为二进制串的过程，方便网络传输；

反序列化是指将二进制串转换为Java对象的过程。

MapReduce编程模型及编程思路：

与实现统计各部门员工薪水总和功能相比，本案例要多建立一个Employee类，Employee类代码如下：

package com.myXuliehua;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

//定义Employee类实现序列化接口
public class Employee implements Writable{
	
	//字段名 EMPNO, ENAME,    JOB,   MGR,   HIREDATE,  SAL, COMM, DEPTNO
	//数据类型：Int，Char，          Char  ， Int，     Date  ，       Int   Int，  Int
	//数据: 7654, MARTIN, SALESMAN, 7698, 1981/9/28, 1250, 1400, 30
	
	//由以上定义变量
	private int empno;
	private String ename;
	private String job;
	private int mgr;
	private String hiredate;
	private int sal;
	private int comm;//奖金
	private int deptno;
	

	//序列化方法：将java对象转化为可跨机器传输数据流（二进制串/字节）的一种技术
	public void write(DataOutput out) throws IOException {
		out.writeInt(this.empno);
		out.writeUTF(this.ename);
		out.writeUTF(this.job);
		out.writeInt(this.mgr);
		out.writeUTF(this.hiredate);
		out.writeInt(this.sal);
		out.writeInt(this.comm);
		out.writeInt(this.deptno);
		
	}
	//反序列化方法：将可跨机器传输数据流（二进制串）转化为java对象的一种技术
	public void readFields(DataInput in) throws IOException {
		this.empno = in.readInt();
		this.ename = in.readUTF();
		this.job = in.readUTF();
		this.mgr = in.readInt();
		this.hiredate = in.readUTF();
		this.sal = in.readInt();
		this.comm = in.readInt();
		this.deptno = in.readInt();
	}
	//其他类通过set/get方法操作变量：Source-->Generator Getters and Setters
	public int getEmpno() {
		return empno;
	}
	public void setEmpno(int empno) {
		this.empno = empno;
	}
	public String getEname() {
		return ename;
	}
	public void setEname(String ename) {
		this.ename = ename;
	}
	public String getJob() {
		return job;
	}
	public void setJob(String job) {
		this.job = job;
	}
	public int getMgr() {
		return mgr;
	}
	public void setMgr(int mgr) {
		this.mgr = mgr;
	}
	public String getHiredate() {
		return hiredate;
	}
	public void setHiredate(String hiredate) {
		this.hiredate = hiredate;
	}
	public int getSal() {
		return sal;
	}
	public void setSal(int sal) {
		this.sal = sal;
	}
	public int getComm() {
		return comm;
	}
	public void setComm(int comm) {
		this.comm = comm;
	}
	public int getDeptno() {
		return deptno;
	}
	public void setDeptno(int deptno) {
		this.deptno = deptno;
	}
	
}

实现分区Patitioner

本案例是对使用序列化案例的优化：可指定输出文件的个数，实现结果分区存放。

MapReduce默认只有一个Reduce输出文件例如：part-r-00000。

分区Partitioner可输出多个Reduce，并有多个不同的输出文件例如：part-r-00000 , part-r-00001, part-r-00002。

MapReduce编程模型及编程思路：

与分区序列化相比，本案例多出一个Patition类，Patition类代码如下：

package com.myPatition;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Partitioner;
//                                    map-outputs:k2,v2-->IntWritable, Employee 
public class SalaryTotalPartitioner extends Partitioner<IntWritable, Employee>{

	@Override
	public int getPartition(IntWritable k2, Employee v2, int numPatition) {
		
		//如何分区: 每个部门放在一个分区
		if(v2.getDeptno() == 10) {
			//放入1号分区中
			return 1%numPatition;// 1%3=1
		}else if(v2.getDeptno() == 20){
			//放入2号分区中
			return 2%numPatition;// 2%3=2
		}else {
			//放入3号分区中
			return 3%numPatition;// 3%3=0
		}			
	}
}

Map使用Combiner

本案例是在实现统计各部门员工薪水总和功能基础上进行。

使用Combine是指在Map输出时使用一次Reduce进行合并中间结果，可以减少Shuffle网络传输次数，提高效率。但注意有些场合不能使用Combiner，例如求平均值时。

MapReduce编程模型及编程思路：

使用Combine很简单，仅仅只需要在Main类中添加，如上图2.1步骤一句代码。

附上三个优化案例的详细代码参考：

链接：https://pan.baidu.com/s/1grC-KLpM6oI2iCY7HkCKwA
提取码：140m

完成！ enjoy it!

Hadoop_Liang

关注

2
点赞
踩
48

收藏

觉得还不错? 一键收藏
1
评论
Maven工程的MapReduce程序3---实现统计各部门员工薪水总和功能(优化)

本文在实现统计各部门员工薪水总和功能的基础上进行，还没实现的话请参考：实现统计各部门员工薪水总和功能优化项目：1.使用序列化2.实现分区Patitioner3.Map使用Combiner使用序列化本案例是在实现统计各部门员工薪水总和功能基础上进行。序列化与反序列化：序列化是指将Java对象转换为二进制串的过程，方便网络传输；反序列化是指将二进制串转换为Ja...
复制链接

扫一扫