Hadoop mapreduce中自定义数据类型作为value值

序列化在分布式环境的两大作用:进程间通信,永久存储。
自定义数据类型需要实现Writable接口才能实现序列化

Any key or value type in the Hadoop Map-Reduce framework implements this interface.

下面是Writable接口的源码:

public interface Writable {
  /** 
   * Serialize the fields of this object to <code>out</code>.
   * 
   * @param out <code>DataOuput</code> to serialize this object into.
   * @throws IOException
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>.  
   * 
   * <p>For efficiency, implementations should attempt to re-use storage in the 
   * existing object where possible.</p>
   * 
   * @param in <code>DataInput</code> to deseriablize this object from.
   * @throws IOException
   */
  void readFields(DataInput in) throws IOException;
}

下面的例子以自定义数据类型作为value进行演示。

1.需求

假设有如下工资条,需要统计每个员工的基本工资,职位工资,绩效工资,岗位津贴的总和。
工资条结构:日期,部门,姓名,职位,基本工资,职位工资,绩效工资,岗位津贴,加班,奖金,差旅补贴,餐补

[hadoop@hadoop1 ~]$ hdfs dfs -cat /salarysummary/input/salarybill.txt
18/06/09 01:17:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-01-01,研发部,张三,软件工程师,2800,9000,3200,1000,0,0,0,0
2015-02-01,研发部,张三,软件工程师,2810,9000,3200,1000,0,0,0,0
2015-03-01,研发部,张三,软件工程师,2820,9000,3200,1000,0,0,0,0
2015-04-01,研发部,张三,软件工程师,2830,9000,3200,1000,0,0,0,0
2015-05-01,研发部,张三,软件工程师,2840,9000,3200,1000,0,0,0,0
2015-01-01,研发部,李四,软件工程师,2800,9010,3200,1000,0,0,0,0
2015-02-01,研发部,李四,软件工程师,2800,9020,3200,1000,0,0,0,0
2015-03-01,研发部,李四,软件工程师,2800,9030,3200,1000,0,0,0,0
2015-04-01,研发部,李四,软件工程师,2800,9040,3200,1000,0,0,0,0
2015-05-01,研发部,李四,软件工程师,2800,9050,3200,1000,0,0,0,0
2015-01-01,研发部,王五,软件工程师,2800,9000,3210,1000,0,0,0,0
2015-02-01,研发部,王五,软件工程师,2800,9000,3220,1000,0,0,0,0
2015-03-01,研发部,王五,软件工程师,2800,9000,3230,1000,0,0,0,0
2015-04-01,研发部,王五,软件工程师,2800,9000,3240,1000,0,0,0,0
2015-05-01,研发部,王五,软件工程师,2800,9000,3250,1000,0,0,0,0

分析

1、自定义类SalaryBillDetail,包含属性 基本工资,职位工资,绩效工资,岗位津贴
2、类SalaryBillDetail实现Writable接口,目的是作为value输出
3、map函数输入的键值对:(行偏移量,一行工资条)
3、map函数输出的键值对:(姓名,SalaryBillDetail对象)
4、reduce函数输入的键值对:(姓名,[SalaryBillDetail对象,…])
5、reduce函数输出的键值对:(姓名,SalaryBillDetail对象)

实现

SalaryBillDetail.java:

package com.demo;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class SalaryBillDetail implements Writable{
    private long jbgz,zwgz,jxgz,gwjt;//分别代表基本工资,职位工资,绩效工资,岗位津贴

    public SalaryBillDetail()//这个默认构造函数需要,否则会报错
    {

    }

    public SalaryBillDetail(long jbgz,long zwgz,long jxgz,long gwjt) {
        this.jbgz=jbgz;
        this.zwgz=zwgz;
        this.jxgz=jxgz;
        this.gwjt=gwjt;
    }
    @Override
    public void write(DataOutput out) throws IOException {//序列化
        // TODO Auto-generated method stub
        out.writeLong(jbgz);
        out.writeLong(zwgz);
        out.writeLong(jxgz);
        out.writeLong(gwjt);
    }

    @Override
    public void readFields(DataInput in) throws IOException {//反序列化
        // TODO Auto-generated method stub
        this.jbgz=in.readLong();
        this.zwgz=in.readLong();
        this.jxgz=in.readLong();
        this.gwjt=in.readLong();
    }

    @Override
    public String toString() {
        // TODO Auto-generated method stub
        return this.jbgz+" "+this.zwgz+" "+this.jxgz+" "+this.gwjt;
    }

    public long getJbgz() {
        return jbgz;
    }
    public void setJbgz(long jbgz) {
        this.jbgz = jbgz;
    }
    public long getZwgz() {
        return zwgz;
    }
    public void setZwgz(long zwgz) {
        this.zwgz = zwgz;
    }
    public long getJxgz() {
        return jxgz;
    }
    public void setJxgz(long jxgz) {
        this.jxgz = jxgz;
    }
    public long getGwjt() {
        return gwjt;
    }
    public void setGwjt(long gwjt) {
        this.gwjt = gwjt;
    }

}

SalaryBillMapper.java:

package com.demo;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class SalaryBillMapper extends Mapper<LongWritable, Text, Text, SalaryBillDetail>{
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, SalaryBillDetail>.Context context)
            throws IOException, InterruptedException {
        // TODO Auto-generated method stub
        String line=value.toString();
        String[] ss=line.split(",");

        SalaryBillDetail sbd=new SalaryBillDetail(Long.parseLong(ss[4]), Long.parseLong(ss[5]), Long.parseLong(ss[6]), Long.parseLong(ss[7]));

        context.write(new Text(ss[2]), sbd);
    }
}

SalaryBillReducer.java:

package com.demo;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SalaryBillReducer extends Reducer<Text, SalaryBillDetail, Text, SalaryBillDetail>{
    @Override
    protected void reduce(Text key, Iterable<SalaryBillDetail> values,
            Reducer<Text, SalaryBillDetail, Text, SalaryBillDetail>.Context context)
            throws IOException, InterruptedException {
        // TODO Auto-generated method stub
        long jbgz=0,zwgz=0,jxgz=0,gwjt=0;//分别代表基本工资,职位工资,绩效工资,岗位津贴
        for(SalaryBillDetail sbd:values)
        {
            jbgz+=sbd.getJbgz();
            zwgz+=sbd.getZwgz();
            jxgz+=sbd.getJxgz();
            gwjt+=sbd.getGwjt();
        }

        context.write(key, new SalaryBillDetail(jbgz, zwgz, jxgz, gwjt));
    }
}

JobRunner.java:

package com.demo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class JobRunner {

    public static void main(String[] args) throws Exception{
        // TODO Auto-generated method stub
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf);

        job.setJarByClass(JobRunner.class);

        job.setMapperClass(SalaryBillMapper.class);

        job.setReducerClass(SalaryBillReducer.class);

        job.setCombinerClass(SalaryBillReducer.class);

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(SalaryBillDetail.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(SalaryBillDetail.class);

        FileInputFormat.setInputPaths(job, new Path("hdfs://192.168.137.23:9000/salarysummary/input"));

        FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.137.23:9000/salarysummary/output"));

        System.exit(job.waitForCompletion(true)?0:1);
    }

}

输出结果:

张三  14100 45000 16000 5000
李四  14000 45150 16000 5000
王五  14000 45000 16150 5000
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

历史五千年

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值