hadoop初识之十二：wordcount 处理过程和mapreduce的数据类型

最新推荐文章于 2021-07-21 12:44:32 发布

风行天下Num1

最新推荐文章于 2021-07-21 12:44:32 发布

阅读量248

点赞数

分类专栏：大数据-杂

本文链接：https://blog.csdn.net/nipanlong001/article/details/77938753

版权

大数据-杂专栏收录该内容

5 篇文章 0 订阅

订阅专栏

--==============wordcount 处理过程=============================
1.将文件拆分成splits,每个文件为一个split,然后将文件拆分成<key,value>
key是行偏移量，value包括了回车所占的字符数。
2.将<key,value> 交给map进行处理，分割并形成新的<key,value>
3.map方法对其value按key进行排序后并输出
4.reduce 对map的结果进行分组合并，得到新的<key,value>（如：<hadoop,list(1,1)>）
5.reduce对<key,value>进行相加汇总
--==============mapreduce 中的数据类型=============================
=》所有数据类型都实现Writable接口，方便网络传输和文件存储
LongWritable BooleanWritable FloatWritable
IntWritable ByteWritable DoubleWritable
Text NullWritable（当key或vaLue为空时使用）
=》mapreduce中的排序
依据key进行排序，故数据类型key值要继承Comparable
=》Writable 类
write()把每个对象序列化到输出流
readFields() 把输入流字节反序列化
=》WritableComparalbe类
key需要继承
=》Java值对象比较
重写toString(),hashCode(),equals()方法
--==============mapreduce 中自定义数据类型举例=============================

package com.npl.hadoop.senier.hdfs;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;

/*
 * self define data type 
 */
public class PaiWritable implements WritableComparable<PaiWritable> {
        private int id;
        private String name;

        public PaiWritable() {

        }

        public PaiWritable(int id, String name) {
                this.set(id, name);
        }

        public void set(int id, String name) {
                this.id = id;
                this.name = name;
        }

        public int getId() {
                return id;
        }

        public void setId(int id) {
                this.id = id;
        }

        public String getName() {
                return name;
        }

        public void setName(String name) {
                this.name = name;
        }

        @Override
        public String toString() {
                return id + "/t" + name;
        }

        // implement writable's two method ,the context has same orders
        // then this class can use as 'value'
        public void write(DataOutput out) throws IOException {
                out.writeInt(id);
                out.writeUTF(name);
        }

        public void readFields(DataInput in) throws IOException {
                this.id = in.readInt();
                this.name = in.readUTF();
        }
        //rewrite this method,this class can used as key 
        public int compareTo(PaiWritable o) {
                int comp = Integer.valueOf(this.getId()).compareTo(
                                Integer.valueOf(o.getId()));
                if (0 != comp) {
                        return comp;
                }
                return this.getName().compareTo(o.getName());
        }

}