Defining a schema Avro schema使用Json定义。schema由原始类型(null,boolean,int,long,float,double,byte和string)和复杂类型(record,enum,array,map,union,fixed)组成。
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
} schema定义了一个代表user的record。一个record的最小定义必须包含类型("type":"record"),名称("name":"user")和fields。我们也可以定义命名空间("namespace":"example.avro"),它将与name属性一起使用构成全名(example.avro.User)。
Fileds定义为对象数组,其中每个定义了name和type。
Serializing and deserializing with code generation
Compiling the schema Code generation允许我们自动创建基于schema的类。一旦我们定义了相关你的类,在程序中就没有必要直接使用schema。
java -jar /path/to/avro-tools-1.7.3.jar compile schema
Creating Users 代码生成后,使用以下代码demo来创建user。
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite color null
// Alternate constructor
User user2 = new User("Ben", 7, "red");
// Construct via builder
User user3 = User.newBuilder()
.setName("Charlie")
.setFavoriteColor("blue")
.setFavoriteNumber(null)
.build();
Serializing
// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter userDatumWriter = new SpecificDatumWriter(User.class);
DataFileWriter dataFileWriter = new DataFileWriter(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
DatumWriter将Java对象转换内存中序列化格式,SpecificDatumWriter与生成的class使用,从特定的生成类型中抽取schema。DataFileWriter写入序列化records和schema。
Deserializing
// Deserialize Users from disk
DatumReader userDatumReader = new SpecificDatumReader(User.class);
DataFileReader dataFileReader = new DataFileReader(file, userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
} SpecificDatumReader转换内存中序列化items为生成class的实例。DataFileReader读取磁盘上的文件。将user对象传递给next方法,重用user对象。
Serializing and deserializing without code generation
Creating users
Schema schema = new Parser().parse(new File("user.avsc"));
GenericRecord user1 = new GenericData.Record(schema);
user1.put("name", "Alyssa");
user1.put("favorite_number", 256);
// Leave favorite color null
GenericRecord user2 = new GenericData.Record(schema);
user2.put("name", "Ben");
user2.put("favorite_number", 7);
user2.put("favorite_color", "red");
由于没有使用code generation,使用GenericRecord替代user。GenericRecord使用schema来验证有效的field。如果我们设置不存在的field,如user1.put("favorite_animal","cat"),会跑抛出异常。
Serializing
// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter datumWriter = new GenericDatumWriter(schema);
DataFileWriter dataFileWriter = new DataFileWriter(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.close();
Deserializing
// Deserialize users from disk
DatumReader datumReader = new GenericDatumReader(schema);
DataFileReader dataFileReader = new DataFileReader(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);