一、《Hadoop权威指南》一书中的示例,测试了一下。
定制的Writable类型:TextPair
功能:存储一对Text对象。代码如下:
package testWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
private void set(Text first, Text second) {
this.first = first;
this.second = second;
}
@Override
public int compareTo(TextPair o) {
int i = first.compareTo(o.first);
if (i == 0) {
return second.compareTo(o.second);
}
return i;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
first.write(dataOutput);
second.write(dataOutput);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
first.readFields(dataInput);
second.readFields(dataInput);
}
@Override
public String toString() {
return first + "\t" + second;
}
}
TextPair类,继承了WritableComparable,分别实现三个方法,compareTo, write,readFields。
write方法:实现序列化; readFields方法:实现反序列化。
当TextPair被用作MapReduce中的键时,需要将数据流反序列化为对象,再调用compareTo进行比较;也可以直接比较序列化得出结果(需要自已定义comparator,继承自WritableComparator,具体参考《Hadoop权威指南》Page.99)
二、定制的Writable:Record (成员变量有int,String类型)
class Record implements WritableComparable<Record> {
private int id;
private String name;
Record() {
id = -1;
name = "null";
}
@Override
public int compareTo(Record o) {
if (this.id > o.id)
return 1;
else if (this.id < o.id)
return -1;
else
return 0;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeInt(id);
dataOutput.writeUTF(name);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
id = dataInput.readInt();
name = dataInput.readUTF();
}
@Override
public String toString() {
return id + "," + name ;
}
}
三、使用定制的Writable时需要注意的地方(如下面的代码所示)
static class Reduce extends Reducer<IntWritable, Record, Record, IntWritable> {
@Override
protected void reduce(IntWritable key, Iterable<Record> values, Context context) throws IOException, InterruptedException {
ArrayList<Record> array = new ArrayList<Record>();
for (Record rec : values) {
if (一个条件) {
//使用了values的迭代,不能够直接array.add(),否则array里面的对象都是初始值,得不到修改后的对象值,因此一定要重新创建一个新的对象,很重要
Record record = new Record();
record.id = rec.id;
record.name = rec.name;
array.add(record);
}
}
for (Record rec : array) {
...其他操作
context.write(rec, new IntWritable(1));
}
}
}