hadoop的序列化机制

[size=large]hadoop不用java的serialization机制[/size]
doug cutting 是这样解释的:
[quote]
Why didn’t I use Serialization when we first started Hadoop? Because it looked
big and hairy and I thought we needed something lean and mean, where we had
precise control over exactly how objects are written and read, since that is central
to Hadoop. With Serialization you can get some control, but you have to fight for
it.
The logic for not using RMI was similar. Effective, high-performance inter-process
communications are critical to Hadoop. I felt like we’d need to precisely control
how things like connections, timeouts and buffers are handled, and RMI gives you
little control over those.
[/quote]
总的意思就是:serialization对hadoop很重要,所以我们要自己实现我们专用的序列化机制。不使用RMI也是一样的道理

[size=large]运用hadoop的序列化[/size]
在hadoop的框架中要使一个类可序列化,要实现Writable接口的两个方法:

public interface Writable {
void write(DataOutput out) throws IOException;
void readFields(DataInput in) throws IOException;
}

比java的实现Serializable复杂很多。但是通过比较可以发现,hadoop的序列化机制产生的数据量远小于java的序列化所产生的数据量。

在这两个方法中自己控制对fileds的输入和输出。如果类中包含有其他对象的引用,那么那个对象也是要实现Writable接口的(当然也可以不实现Writable借口,只要自己处理好对对象的fileds的存贮就可以了)。
下面是一个简单的例子:
类Attribute

package siat.miner.etl.instance
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

public class Attribute implements Writable{

public static int ATTRIBUTE_TYPE_STRING = 1;//string type
public static int ATTRIBUTE_TYPE_NOMINAL = 2;//nominal type
public static int ATTRIBUTE_TYPE_REAL = 3;//real type

private IntWritable type;
private Text name;
public IntWritable getType() {
return type;
}
public void setType(int type) {
this.type = new IntWritable(type);
}
public Text getName() {
return name;
}
public void setName(String name) {
this.name = new Text(name);
}
public Attribute() {
super();
this.type = new IntWritable(0);
this.name = new Text("");
}
public Attribute(int type, String name) {
super();
this.type = new IntWritable(type);
this.name = new Text(name);
}
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
type.readFields(in);
name.readFields(in);

}
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
type.write(out);
name.write(out);

}
}

类TestA:

package siat.miner.etl.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInput;
import java.io.DataInputStream;
import java.io.DataOutput;
import java.io.DataOutputStream;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;

import siat.miner.etl.instance.Attribute;

public class TestA implements Writable{

private Attribute a;
private IntWritable b;
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub

Attribute a = new Attribute(Attribute.ATTRIBUTE_TYPE_NOMINAL, "name");
TestA ta = new TestA(a, new IntWritable(1));
ByteArrayOutputStream bos = new ByteArrayOutputStream();
DataOutputStream oos = new DataOutputStream(bos);
ta.write(oos);

TestA tb = new TestA();
tb.readFields(new DataInputStream(new ByteArrayInputStream(bos.toByteArray())));
}
public TestA(Attribute a, IntWritable b) {
super();
this.a = a;
this.b = b;
}
public TestA() {
// TODO Auto-generated constructor stub
}
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
a = new Attribute();
a.readFields(in);
b = new IntWritable();
b.readFields(in);
}
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
a.write(out);
b.write(out);
}

}


可以看到,hadoop的序列化机制就是利用java的DataInput和DataOutput来完成对基本类型的序列化,然后让用户自己来处理对自己编写的类的序列化。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值