[笔记]avro 介绍及官网例子

[b]Apache Avro[/b]是一个独立于编程语言的数据序列化系统。旨在解决Hadoop中Writable类型的不足:缺乏语言的可移植性。其强调数据的自我描述,依赖于它的schema。即支持动态加载schema,动态映射;也支持代码生成的描述性映射。
[i]官网的介绍:[/i]
[quote]Apache Avro™ is a data serialization system. Avro provides:
[*]Rich data structures.
[*]A compact, fast, binary data format.
[*]A container file, to store persistent data.
[*]Remote procedure call (RPC).
[*]Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
[/quote]
[b]官网例子:[/b]
[i]依赖[/i]
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>

[i]插件[/i]

<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro.version}</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>

[i]schemas:(src/main/avro/user.avsc)[/i]

{"namespace": "com.sanss.hadoop.demos.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}

[*][b]Spedic Java Mapping[/b]
[i]生成java文件:[/i]

mvn clean compile

[i]创建对象[/i]

User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite color null

// Alternate constructor
User user2 = new User("Ben", 7, "red");

// Construct via builder
User user3 = User.newBuilder().setName("Charlie")
.setFavoriteColor("blue").setFavoriteNumber(null).build();

[i]序列化[/i]

// Serialize to disk
File file = new File("users.avro");
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(
User.class);
try (DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(
userDatumWriter);) {
dataFileWriter.create(User.SCHEMA$, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
}

[i]反序列化[/i]

// Deserialize Users from disk
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(
User.class);
try (DataFileReader<User> dataFileReader = new DataFileReader<User>(
file, userDatumReader);) {
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
}

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}

[*][b]Generic Java Mapping[/b]
[i]创建对象[/i]

Schema schema = new Schema.Parser().parse(new File(
GenericJavaMappingDemo.class.getClassLoader()
.getResource("user.avsc").toURI()));
GenericRecord user1 = new GenericData.Record(schema);
user1.put("name", "Alyssa");
user1.put("favorite_number", 256);
// Leave favorite color null

GenericRecord user2 = new GenericData.Record(schema);
user2.put("name", "Ben");
user2.put("favorite_number", 7);
user2.put("favorite_color", "red");

[i]序列化[/i]

// Serialize users to disk
File file = new File("users.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(
schema);
try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(
datumWriter);) {
dataFileWriter.create(schema, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.close();
}

[i]反序列化[/i]

// Deserialize users from disk
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(
schema);
try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(
file, datumReader);) {
GenericRecord user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
}

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

[*][b]Schemas介绍:[/b]
Avro依赖于schemas,schemas使用JSON定义,支持基本的类型包括[b]null, boolean, int, long, float, double, bytes , string[/b];支持的复合类型包括[b]record, enum, array, map, union, fixed[/b]。avro可以通过schemas自动生成代码来表示avro的数据类型(Spedific Java mapping);也可以动态映射(Generic Java mapping)。(Reflect Java mapping不推荐)。
[table]
|类型名称|描述|
|null|空值|
|boolean|二进制值|
|int|32位带符号整数|
|long|64位带符号整数|
|float|单精度32位浮点数IEEE754|
|double|双精度64位浮点数IEEE754|
|bytes|8位无符号字节序列|
|string|Unicode字符序列|
|record|任意类型的一个命名字段集合,JSON对象表示|
|enum|一个命名的值集合|
|array|未排序的对象集合,对象的模式必须相同|
|map|未排序的对象键/值对。键必须是字符串,值可以是任何类型,但必须模式相同|
|union|模式的并集,可以用JSON数组表示,每个元素为一个模式|
|fixed|一组固定数量的8位无符号字节|
[/table]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值