Avro(读音类似于[ævrə])是Hadoop的一个子项目,
由Hadoop的创始人Doug Cutting(也是Lucene,Nutch等项目的创始人,膜拜)牵头开发,
Apache AVRO
介绍
Apache Avro™ 是一个数据序列化系统
Avro 提供的属性:
- 丰富的数据结构
- 使用快速压缩,二进制数据格式
- 提供容器文件用于持久化数据
- RPC调用
- 与动态语言的简单集成。读取或写入数据文件或使用或实现RPC协议不需要代码生成。代码生成作为一种可选的优化,只值得在静态类型语言中实现。
类似的框架
- Facebook 的Thrift
- Google Protocol Buffers
Avro Schema
Schema 以json格式定义,数据通常采用二进制格式编码。
Avro 数据类型
- 基本类型:
类型 | 示例 |
---|---|
null | “null” |
boolean | “boolean” |
int | “int” |
long | “long” |
float | “float” |
double | “double” |
bytes | “bytes” |
string | “string” |
- 复杂类型
类型 | 示例
record:
{
"type": "record",
"name": "WeatherRecord",
"doc": "A weather reading.",
"fields": [
{"name": "year", "type": "int"},
{"name": "temperature","type":"int"},
{"name": "stationId", "type": "string"}
]
}
array:
{
"type": "array",
"items": "long"
}
map:
{
"type": "map",
"values": "string"
}
enum:
{
"type": "enum",
"name": "Cutlery",
"doc": "An eating utensil.",
"symbols": ["KNIFE", "FORK", "SPOON"]
}
fixed: 一组固定数量的8位无符号字节
{
"type": "fixed",
"name": "Md5Hash",
"size": 16
}
union:
{
"null",
"string",
{"type": "map", "values": "string"}
}
record,enum,array,map,union,fixed
// user.avsc
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
Java中使用
使用工具生dto
java -jar /path/to/avro-tools-1.8.1.jar compile schema <schema file> <destination>
创建实例
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
User user2 = new User("Ben",7,"red");
User user3 = User.newBuilder()
.setName("Charlie")
.setFavoriteColor("blue")
.setFavoriteNumber(null)
.build();
序列化
//Serializing
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(),new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
反序列化
// Deserialize Users from disk
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(new File("users.avro"), userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}