1、 在hadoop中所有的key/value都必须实现Writable接口,有两个方法,分别用于读(反序列化)和写(序列化)操作。
参考代码:
1 package org.dragon.hadoop.mapreduce.app; 2 3 import java.io.DataInput; 4 import java.io.DataOutput; 5 import java.io.IOException; 6 7 import org.apache.hadoop.io.Writable; 8 9 /** 10 * 11 * @author ZhuXY 12 * @time 2016-3-10 下午3:49:55 13 * 14 */ 15 public class DataWritable implements Writable { 16 17 // telsphone 18 19 // upload 20 private int upPackNum; 21 private int upPayLoad; 22 23 // download 24 private int downPackNum; 25 private int downPayLoad; 26 27 public DataWritable() { 28 29 } 30 31 public void set(int upPackNum, int upPayLoad, int downPackNum, 32 int downPayload) { 33 this.upPackNum = upPackNum; 34 this.upPayLoad = upPayLoad; 35 this.downPackNum = downPackNum; 36 this.downPayLoad = downPayload; 37 38 } 39 40 public int getUpPackNum() { 41 return upPackNum; 42 } 43 44 public int getUpPayLoas() { 45 return upPayLoad; 46 } 47 48 public int getDownPackNum() { 49 return downPackNum; 50 } 51 52 public int getDownPayload() { 53 return downPayLoad; 54 } 55 56 @Override 57 public void write(DataOutput out) throws IOException { 58 out.writeInt(upPackNum); 59 out.writeInt(upPayLoad); 60 out.writeInt(downPackNum); 61 out.writeInt(downPayLoad); 62 } 63 64 /** 65 * 讀出的順序要和寫入的順序相同 66 */ 67 @Override 68 public void readFields(DataInput in) throws IOException { 69 // TODO Auto-generated method stub 70 this.upPackNum = in.readInt(); 71 this.upPayLoad = in.readInt(); 72 this.downPackNum = in.readInt(); 73 this.downPayLoad = in.readInt(); 74 } 75 76 @Override 77 public String toString() { 78 return upPackNum + "\t" + upPayLoad + "\t" + downPackNum + "\t" 79 + downPayLoad; 80 } 81 82 @Override 83 public int hashCode() { 84 final int prime = 31; 85 int result = 1; 86 result = prime * result + downPackNum; 87 result = prime * result + downPayLoad; 88 result = prime * result + upPackNum; 89 result = prime * result + upPayLoad; 90 return result; 91 } 92 93 @Override 94 public boolean equals(Object obj) { 95 if (this == obj) 96 return true; 97 if (obj == null) 98 return false; 99 if (getClass() != obj.getClass()) 100 return false; 101 DataWritable other = (DataWritable) obj; 102 if (downPackNum != other.downPackNum) 103 return false; 104 if (downPayLoad != other.downPayLoad) 105 return false; 106 if (upPackNum != other.upPackNum) 107 return false; 108 if (upPayLoad != other.upPayLoad) 109 return false; 110 return true; 111 } 112 113 }
2、所有的key必须实现Comparable接口,在MapReduce过程中需要对Key/Value对进行反复的排序。默认情况下依据Key进行排序的,要实现comparaTo()方法。所以通过Key既要实现Writable接口又要实现Comparable接口,Hadoop中提供了一个公共的接口,叫做WritableComparable接口:
3、由于需要序列化反序列化和进行比较,对java对象需要重写一下几个方法:
① equals();
② hashCode();
③ toString()方法
如IntWritable类型的实现:
1 package org.apache.hadoop.io; 2 3 import java.io.*; 4 5 /** A WritableComparable for ints. */ 6 public class IntWritable implements WritableComparable { 7 private int value; 8 9 public IntWritable() {} 10 11 public IntWritable(int value) { set(value); } 12 13 /** Set the value of this IntWritable. */ 14 public void set(int value) { this.value = value; } 15 16 /** Return the value of this IntWritable. */ 17 public int get() { return value; } 18 19 public void readFields(DataInput in) throws IOException { 20 value = in.readInt(); 21 } 22 23 public void write(DataOutput out) throws IOException { 24 out.writeInt(value); 25 } 26 27 /** Returns true iff <code>o</code> is a IntWritable with the same value. */ 28 public boolean equals(Object o) { 29 if (!(o instanceof IntWritable)) 30 return false; 31 IntWritable other = (IntWritable)o; 32 return this.value == other.value; 33 } 34 35 public int hashCode() { 36 return value; 37 } 38 39 /** Compares two IntWritables. */ 40 public int compareTo(Object o) { 41 int thisValue = this.value; 42 int thatValue = ((IntWritable)o).value; 43 return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); 44 } 45 46 public String toString() { 47 return Integer.toString(value); 48 }
4、数据类型,必须有一个无参的构造方法,为了方便反射创建对象。
在自定义数据类型中,建议使用java原生数据类型,最好不要使用hadoop对原生类型封装好的数据类型,即如下样例代码:
推荐使用:
不建议使用:
5、问题:
当数据写入磁盘时,如果要进行排序的话,需要首先从磁盘中读取数据进行反序列化成对象,然后在内存中对反序列化的对象进行比较。
对字节(未经过反序列化字节)进行直接比较,不需要进行反序列化以后再比较呢?如果要实现上述功能,Hadoop数据类型需要实现一个接口RawComparator。
在Hadoop中有一个针对Writable数据类型,进行实现的一个通用实现类WritableComparator类。所有的数据类型,只需要继承通用类,再去需要具体功能复写相应的compara()方法。一下以IntWritable为例,查看一下: