一、Avro介绍
----------------------------------------------
1.数据串行化系统
2.提供了丰富的数据结构,紧凑的快速的二进制格式,存储持久化数据的容器,远程过程调用
3.动态语言的简单集成,代码生成不需要读写数据文件,也不需要实现RPC协议
4.跨语言
5.可压缩,可切分
6.自描述语言:数据和数据结构都存在文本文件中。使用json的格式进行储存数据
二、Avro数据类型
---------------------------------------------
1.基本数据类型
null Null is a type having no value.
int 32-bit signed integer.
long 64-bit signed integer.
float single precision (32-bit) IEEE 754 floating-point number.
double double precision (64-bit) IEEE 754 floating-point number.
bytes sequence of 8-bit unsigned bytes.
string Unicode character sequence.
2.复杂数据类型
a.Records,是一种集合,类似于表结构,包括以下属性
name
namespace
type
fields
b.Enums
{
"type" : "enum",
"name" : "Numbers",
"namespace": "data",
"symbols" : [ "ONE", "TWO" , "THREE", "FOUR" ]
}
c.Arrays
{
"type" : "array",
"items" : "int"
}
d.Maps
{"type" : "map", "values" : "int"}
e.Unions
{
"type" : "record",
"namespace" : "tutorialspoint",
"name" : "empdetails",
"fields" :[
{ "name" : "experience", "type": ["int", "null"] },
{ "name" : "age", "type": "int" }
]
}
f.Fixed
{
" type " : " fixed " ,
" name " : " bdata ",
" size " : 1048576
}
三、通过编译,使用Avro进行串行和反串行
-------------------------------------------------
1.根据数据设计并创建schema模板,employee.avsc(一张名为Employee的table,有两个字段name和age)
{
"type" : "record",
"namespace" : "tsAvro",
"name" : "Employee",
"fields" : [
{ "name" : " Name" , "type" : "string" },
{ "name" : "age" , "type" : "int" }
]
}
2.编译employee.avsc
$cmd> java -jar D:/Packages/avro/avro-tools-1.8.0.jar compile schema D:/Packages/avro/employee.avsc D:/Packages/avro
3.新建java项目,并将生成的类,导入到java项目中
4.添加Maven依赖
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.1</version>
</dependency>
5.创建对象
Employee e1=new Employee();
e1.setAge(11);
e1.setName("tom1");
//e2 and e3 ...
6.串行化对象e1,e2,e3到本地磁盘
@Test
public void tsAvroWrite() throws Exception {
//1.创建对象e1,e2,e3
Employee e1=new Employee();
e1.setAge(11);
e1.setName("tom1");
Employee e2=new Employee();
e2.setAge(12);
e2.setName("tom2");
Employee e3=new Employee();
e3.setAge(13);
e3.setName("tom3");
//2.创建一个SpecificDatumWriter,将java对象转成内存中的序列化格式对象
DatumWriter<Employee> empDatumWriter = new SpecificDatumWriter<Employee>(Employee.class);
//3.构建一个写入器,将序列化好的对象,写入到磁盘中
DataFileWriter<Employee> empFileWriter = new DataFileWriter<Employee>(empDatumWriter);
//4.创建本地磁盘序列化文件
empFileWriter.create(e1.getSchema(),new File("D:\\Packages\\avro\\emp.avro"));
//5.开始序列化对象到磁盘文件
empFileWriter.append(e1);
empFileWriter.append(e2);
empFileWriter.append(e3);
//6.序列化完成,关闭写入器
empFileWriter.close();
System.out.println("data successfully serialized");
}
7.反串行加载本地文件到java对象
@Test
public void tsAvroRead() throws Exception
{
//1.创建一个SpecificDatumReader,用于将本地文件转化成内存中序列化流
DatumReader<Employee> empDatumReader = new SpecificDatumReader<Employee>(Employee.class);
//2.创建读取器
DataFileReader<Employee> dataFileReader = new DataFileReader<Employee>(new File("D:\\Packages\\avro\\emp.avro"),empDatumReader);
//3.开始读取。反串行
while(dataFileReader.hasNext())
{
Employee e = new Employee();
dataFileReader.next(e);
System.out.println(e.getName());
}
}
四、通过解析器,使用Avro进行串行和反串行
-------------------------------------------------------
/**
* 使用Schema进行串行化(非编译模式)
*/
@Test
public void tsAvroWriteBySchema() throws Exception
{
//1.解析本地的schema文件(*.avsc)
Schema schema = new Schema.Parser().parse(new File("D:\\Packages\\avro\\employee.avsc"));
//2.将schema文件解析成常规记录类型,相当于编译生成的Employee那个类
//其实就是创建Employee对象的过程
GenericRecord e1 = new GenericData.Record(schema);
e1.put("name","tom1");
e1.put("age",11);
GenericRecord e2 = new GenericData.Record(schema);
e2.put("name","tom2");
e2.put("age",12);
GenericRecord e3 = new GenericData.Record(schema);
e3.put("name","tom3");
e3.put("age",13);
//2.创建一个GenericDatumWriter,将java对象转成内存中的序列化格式对象
DatumWriter<GenericRecord> empDatumWriter = new GenericDatumWriter<GenericRecord>(schema);
//3.构建一个写入器,将序列化好的对象,写入到磁盘中
DataFileWriter<GenericRecord> empFileWriter = new DataFileWriter<GenericRecord>(empDatumWriter);
//4.创建本地磁盘序列化文件
empFileWriter.create(schema,new File("D:\\Packages\\avro\\emp1.avro"));
//5.开始序列化对象到磁盘文件
empFileWriter.append(e1);
empFileWriter.append(e2);
empFileWriter.append(e3);
//6.序列化完成,关闭写入器
empFileWriter.close();
System.out.println("data successfully serialized");
}
/**
* 使用Schema进行反串行化(非编译模式)
*/
@Test
public void tsAvroReadBySchema() throws Exception
{
//1.解析本地的schema文件(*.avsc)
Schema schema = new Schema.Parser().parse(new File("D:\\Packages\\avro\\employee.avsc"));
//2.创建一个GenericDatumReader,用于将本地文件转化成内存中序列化流
DatumReader<GenericRecord> empDatumReader = new GenericDatumReader<GenericRecord>(schema);
//3.创建读取器
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("D:\\Packages\\avro\\emp1.avro"),empDatumReader);
//4.开始读取。反串行
while(dataFileReader.hasNext())
{
GenericRecord e = null;
e = dataFileReader.next(e);
System.out.println(e.get("name"));
}
}
五、avro_tools.jar的使用
-------------------------------------------------
1.tojson 将DataFileWriter序列化的文件,以json的格式进行读取
java -jar avro-tools-1.8.0.jar tojson a.avro
2.compile 将schema模式文件编译成源代码,以便于使用
java -jar avro-tools-1.8.0.jar complie (schema|protocol) input... outputdir