hadoop 如何自定义类型

记录一下hadoop 数据类型章节的笔记,以便后期使用,本文是边学习边记录,持续更新中

[size=large][b]Hadoop 常用自带的数据类型和Java数据类型配比如下[/b][/size]

[table]
|[color=red]Hadoop类型[/color]|[color=red]Java类型[/color]|[color=red]描述[/color]|
|BooleanWritable|boolean|布尔型|
|IntWritable|int|整型|
|FloatWritable|float|浮点float|
|DoubleWritable|double|浮点型double|
|ByteWritable|byte|整数类型byte|
|Text|String|字符串型|
|ArrayWritable|Array|数组型|
[/table]
[/table]

在此首先明确定义下序列化
[size=medium][i][b]参考百度百科
序列化 (Serialization)将对象的状态信息转换为可以存储或传输的形式的过程。在序列化期间,对象将其当前状态写入到临时或持久性存储区。以后,可以通过从存储区中读取或反序列化对象的状态,重新创建该对象。[/b][/i][/size]

Hadoop自定义类型必须实现的一个接口 Writable 代码如下

[code="java"]public interface Writable {

void write(DataOutput out) throws IOException;

void readFields(DataInput in) throws IOException;
}
[/code]

write 方法:Serialize the fields of this object to out

readFields:Deserialize the fields of this object from in

实现该接口后,还需要手动实现一个静态方法,在该方法中返回自定义类型的无参构造方法

for example
 public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}



官方完成例子


public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;

public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}

public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}

public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}




WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.


如果该自定义类型作为key,那么需要实现 WritableComparable 接口,这个接口实现了两个接口 ,分别为 Comparable<T>, Writable

类似上一段代码 主要新增 compareTo 方法 代码如下

  public int compareTo(MyWritableComparable w) {
int thisValue = this.value;
int thatValue = ((IntWritable)o).value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}



[size=large][color=red]特殊的类型 NullWritable[/color][/size]

NullWritable是Writable的一个特殊类,序列化的长度为0,实现方法为空实现,不从数据流中读数据,也不写入数据,只充当占位符,如在MapReduce中,如果你不需要使用键或值,你就可以将键或值声明为NullWritable,NullWritable是一个不可变的单实例类型。

[size=large][color=red]特殊的类型 ObjectWritable[/color][/size]
ObjectWritable 是对java 基本类型的一个通用封装:用于客户端与服务器间传输的Writable对象,也是对RPC传输对象的封装,因为RPC上交换的信息只能是JAVA的基础数据类型,String或者Writable类型,而ObjectWritable是对其子类的抽象封装
ObjectWritable会往流里写入如下信息:

对象类名,对象自己的串行化结果

其序列化和反序列化方法如下:

public void readFields(DataInput in) throws IOException {
readObject(in, this, this.conf);
}

public void write(DataOutput out) throws IOException {
writeObject(out, instance, declaredClass, conf);
}

public static void writeObject(DataOutput out, Object instance,
Class declaredClass,
Configuration conf) throws IOException {
//对象为空则抽象出内嵌数据类型NullInstance
if (instance == null) { // null
instance = new NullInstance(declaredClass, conf);
declaredClass = Writable.class;
}
//先写入类名
UTF8.writeString(out, declaredClass.getName()); // always write declared
/*
* 封装的对象为数组类型,则逐个序列化(序列化为length+对象的序列化内容)
* 采用了迭代
*/

if (declaredClass.isArray()) { // array
int length = Array.getLength(instance);
out.writeInt(length);
for (int i = 0; i < length; i++) {
writeObject(out, Array.get(instance, i),
declaredClass.getComponentType(), conf);
}
//为String类型直接写入
} else if (declaredClass == String.class) { // String
UTF8.writeString(out, (String)instance);

}//基本数据类型写入
else if (declaredClass.isPrimitive()) { // primitive type

if (declaredClass == Boolean.TYPE) { // boolean
out.writeBoolean(((Boolean)instance).booleanValue());
} else if (declaredClass == Character.TYPE) { // char
out.writeChar(((Character)instance).charValue());
} else if (declaredClass == Byte.TYPE) { // byte
out.writeByte(((Byte)instance).byteValue());
} else if (declaredClass == Short.TYPE) { // short
out.writeShort(((Short)instance).shortValue());
} else if (declaredClass == Integer.TYPE) { // int
out.writeInt(((Integer)instance).intValue());
} else if (declaredClass == Long.TYPE) { // long
out.writeLong(((Long)instance).longValue());
} else if (declaredClass == Float.TYPE) { // float
out.writeFloat(((Float)instance).floatValue());
} else if (declaredClass == Double.TYPE) { // double
out.writeDouble(((Double)instance).doubleValue());
} else if (declaredClass == Void.TYPE) { // void
} else {
throw new IllegalArgumentException("Not a primitive: "+declaredClass);
}
//枚举类型写入
} else if (declaredClass.isEnum()) { // enum
UTF8.writeString(out, ((Enum)instance).name());
//hadoop的Writable类型写入
} else if (Writable.class.isAssignableFrom(declaredClass)) { // Writable
UTF8.writeString(out, instance.getClass().getName());
((Writable)instance).write(out);

} else {
throw new IOException("Can't write: "+instance+" as "+declaredClass);
}
}

public static Object readObject(DataInput in, Configuration conf)
throws IOException {
return readObject(in, null, conf);
}

/** Read a {<a href="http://my.oschina.net/link1212" class="referer" target="_blank">@link</a> Writable}, {<a href="http://my.oschina.net/link1212" class="referer" target="_blank">@link</a> String}, primitive type, or an array of
* the preceding. */
@SuppressWarnings("unchecked")
public static Object readObject(DataInput in, ObjectWritable objectWritable, Configuration conf)
throws IOException {
//获取反序列化的名字
String className = UTF8.readString(in);
//假设为基本数据类型
Class<?> declaredClass = PRIMITIVE_NAMES.get(className);
/*
* 判断是否为基本数据类型,不是则为空,则为Writable类型,
* 对于Writable类型从Conf配置文件中读取类名,
* 在这里只是获取类名,而并没有反序列化对象
*/

if (declaredClass == null) {
try {
declaredClass = conf.getClassByName(className);
} catch (ClassNotFoundException e) {
throw new RuntimeException("readObject can't find class " + className, e);
}
}
//基本数据类型
Object instance;
//为基本数据类型,逐一反序列化
if (declaredClass.isPrimitive()) { // primitive types

if (declaredClass == Boolean.TYPE) { // boolean
instance = Boolean.valueOf(in.readBoolean());
} else if (declaredClass == Character.TYPE) { // char
instance = Character.valueOf(in.readChar());
} else if (declaredClass == Byte.TYPE) { // byte
instance = Byte.valueOf(in.readByte());
} else if (declaredClass == Short.TYPE) { // short
instance = Short.valueOf(in.readShort());
} else if (declaredClass == Integer.TYPE) { // int
instance = Integer.valueOf(in.readInt());
} else if (declaredClass == Long.TYPE) { // long
instance = Long.valueOf(in.readLong());
} else if (declaredClass == Float.TYPE) { // float
instance = Float.valueOf(in.readFloat());
} else if (declaredClass == Double.TYPE) { // double
instance = Double.valueOf(in.readDouble());
} else if (declaredClass == Void.TYPE) { // void
instance = null;
} else {
throw new IllegalArgumentException("Not a primitive: "+declaredClass);
}

} else if (declaredClass.isArray()) { // array
int length = in.readInt();
instance = Array.newInstance(declaredClass.getComponentType(), length);
for (int i = 0; i < length; i++) {
Array.set(instance, i, readObject(in, conf));
}

} else if (declaredClass == String.class) { // String类型的反序列化
instance = UTF8.readString(in);
} else if (declaredClass.isEnum()) { // enum的反序列化
instance = Enum.valueOf((Class<? extends Enum>) declaredClass, UTF8.readString(in));
} else { // Writable
Class instanceClass = null;
String str = "";
try {
//剩下的从Conf对象中获取类型Class
str = UTF8.readString(in);
instanceClass = conf.getClassByName(str);
} catch (ClassNotFoundException e) {
throw new RuntimeException("readObject can't find class " + str, e);
}
/*
* 带用了WritableFactories工厂去new instanceClass(实现了Writable接口)对象出来
* 在调用实现Writable对象自身的反序列化方法
*/

Writable writable = WritableFactories.newInstance(instanceClass, conf);
writable.readFields(in);
instance = writable;

if (instanceClass == NullInstance.class) { // null
declaredClass = ((NullInstance)instance).declaredClass;
instance = null;
}
}
//最后存储反序列化后待封装的ObjectWritable对象
if (objectWritable != null) { // store values
objectWritable.declaredClass = declaredClass;
objectWritable.instance = instance;
}

return instance;

}


[size=large][color=red]特殊的类型 GenericWritable[/color][/size]
例如一个reduce中的输入从多个map中获,然而各个map的输出value类型都不同,这就需要 GenericWritable 类型 map端用法如下
 context.write(new Text(str), new MyGenericWritable(new LongWritable(1)));
context.write(new Text(str), new MyGenericWritable(new Text("1")));

在reduce 中用法如下

for (MyGenericWritable time : values){
//获取MyGenericWritable对象
Writable writable = time.get();
//如果当前是LongWritable类型
if (writable instanceof LongWritable){

count += ((LongWritable) writable).get();
}
//如果当前是Text类型
if (writable instanceof Text){
count += Long.parseLong(((Text)writable).toString());
}
}


自定义MyGenericWritable如下

class MyGenericWritable extends GenericWritable{

//无参构造函数
public MyGenericWritable() {

}

//有参构造函数
public MyGenericWritable(Text text) {
super.set(text);
}

//有参构造函数
public MyGenericWritable(LongWritable longWritable) {
super.set(longWritable);
}


@Override
protected Class<? extends Writable>[] getTypes() {

return new Class[]{LongWritable.class,Text.class};
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值