avro是apache下hadoop的子项目,拥有序列化、反序列化、RPC功能。序列化的效率比jdk更高,与Google的protobuffer相当,比facebook开源Thrift(后由apache管理了)更优秀。
因为avro采用schema,如果是序列化大量类型相同的对象,那么只需要保存一份类的结构信息+数据,大大减少网络通信或者数据存储量
例子:
新建一个maven工程
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.jv</groupId>
<artifactId>avro</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>avro</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<compiler-plugin.version>2.3.2</compiler-plugin.version>
<avro.version>1.7.5</avro.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.6.4</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
<version>1.7.5</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${compiler-plugin.version}</version>
</plugin>
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.7.5</version>
<executions>
<execution>
<id>schemas</id>
<phase>generate-sources</phase>
<goals>
<!-- 编译avsc文件 -->
<goal>schema</goal>
<!-- 编译avpr文件 -->
<goal>protocol</goal>
<!-- 编译avdl文件 -->
<goal>idl-protocol</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
因为maven可能不认识<execution>中的配置,所以在maven设置中需要忽略这部分校验,否则pom.xml会报错
这段配置的是代码生成插件,需要给工程创建一个src/main/avro 源文件目录
具体步骤为:
编写模式文件user.avsc:
{
"namespace": "com.jv.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "username","type": "string"},
{"name": "age","type": ["int", "null"]},
{"name": "address","type": ["string", "null"]}
]
}
namespace:命名空间,在使用插件生成代码的时候,User类的包名就是它
type:有 records, enums, arrays, maps, unions , fixed 取值,records是相当于普通的class
name:类名,类的全名有namespace+name构成
doc:注释
aliases:取的别名,其他地方使用可以使用别名来引用
fields:属性
name:属性名
type:属性类型,可以是用["int","null"]或者["int",1]执行默认值
default:也可以使用该字段指定默认值
doc:注释
根据模式定义生成代码
按照图中圈出来的步骤操作
观察console中是否输出SUCCESS,是则说明成功了
测试代码
package com.jv.test;
import java.io.File;
import java.io.IOException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import com.jv.avro.User;
public class TestAvro {
public static void main(String[] args) throws IOException {
//实例化代码方式1
User user1 = new User();
user1.setUsername(new String("Messi"));
user1.setAddress("Barcelona");
user1.setAge(30);
//实例化代码方式3
User user2 = new User(new String("Messi"),30,"巴塞罗那");
//实例化代码方式3
User user3 = new User().newBuilder().setUsername("Havi").setAge(34).setAddress("卡塔尔").build();
//序列化对象并保存到文件中
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
//从文件中反序列化对象输出
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(new File("users.avro"), userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
user = dataFileReader.next(user);
System.out.println(user);
}
}
}
运行后的输出结果