ProtocolBuffer结合LZO在 Hadoop中的使用(一)
1.ProtocolBuffer
首先介绍一下ProtocolBuffer吧,可以参考:Protocol Buffer官网
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
简而言之,就是说Protocol buffers能灵活有效地序列化结构化的数据。
接下来是Java中使用它的教程:Java Protocol Buffers
(1)定义消息格式在.proto文件
package tutorial;
option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
message AddressBook {
repeated Person person = 1;
}
这是一个.proto文件的例子,
这里挺好理解这个文件的,需要说明一下的是,如果不定义java_outer_classname,那么就会只用文件名作为classname,字段分为required,optional和repeated
,其中repeated指的是字段可能重复出现,
(2)编译你的Protocol Buffers
1.首先你需要下载安装环境:下载安装
2. 运行如下代码:
protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto
这时,你就会得到
com/example/tutorial/AddressBookProtos.java
类
(3)使用 Java protocol buffer API 去读写消息
// required string name = 1;
public boolean hasName();
public String getName();
// required int32 id = 2;
public boolean hasId();
public int getId();
// optional string e
mail = 3;
public boolean hasEmail();
public String getEmail();
// repeated .tutorial.Person.PhoneNumber phone = 4;
public List<PhoneNumber> getPhoneList();
public int getPhoneCount();
public PhoneNumber getPhone(int index);
这是相关的字段,分别解析成了java不同的类型,
同时Person.Builder
// required string name = 1;
public boolean hasName();
public java.lang.String getName();
public Builder setName(String value);
public Builder clearName();
// required int32 id = 2;
public boolean hasId();
public int getId();
public Builder setId(int value);
public Builder clearId();
// optional string email = 3;
public boolean hasEmail();
public String getEmail();
public Builder setEmail(String value);
public Builder clearEmail();
// repeated .tutorial.Person.PhoneNumber phone = 4;
public List<PhoneNumber> getPhoneList();
public int getPhoneCount();
public PhoneNumber getPhone(int index);
public Builder setPhone(int index, PhoneNumber value);
public Builder addPhone(PhoneNumber value);
public Builder addAllPhone(Iterable<PhoneNumber> value);
public Builder clearPhone();