Hadoop-Apache Avro数据序列化系统

最新推荐文章于 2023-12-13 23:05:08 发布

fantasticqiang

最新推荐文章于 2023-12-13 23:05:08 发布

阅读量389

点赞数 1

分类专栏： hadoop

本文链接：https://blog.csdn.net/fantasticqiang/article/details/80784505

版权

hadoop 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

背景

Avro由hadoop之父Doug Cutting开发，旨在解决hadoop中序列化的语言可移植性。是一种能够被多种语言（C,C#,C++,JAVA,PHP,Python,Ruby）处理的数据格式。

Avro模式通常用JSON来写，使用二进制格式进行编码。Avro是自描述语言，数据结构和数据都存在文件中。文件可压缩可切割。

Avro中的类型

原始数据类型

原始数据类型包含基本数据类型，（null,boolean,int,long,float,double,bytes,string）。

复杂数据类型

复杂数据类型包含数组array，map，record（任意类型的命名字段集合），enum（枚举类型），fixed（固定8位无符号字节），union（模式的并集，使用JSON数组表示，每个元素是一个模式）。下面是举一个record类型的例子：

//namespace可写也可不写，使用avro的tool jar包生成模式对象的时候表示包路径
{
"namespace": "com.lqq.avro",
"type": "record",
"name":
"WeatherRecord",
"doc": "A weather reading.",
"fields": [
{"name":
"year", "type":
"int"},
{"name":
"temperature",
"type": "int"},
{"name":
"stationId",
"type": "string"}
]
}

使用Avro

定义schema，编译生成java类

//weather.avsc是模式文件，“.”表示生成类文件到当前路径下
>java -jar avro-tools-1.8.0.jar compile schema weather.avsc .
//生成的"com/lqq/avro"文件夹中有我们想要的类文件

maven pom.xml中引入“avro”依赖

<!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.8.0</version>
</dependency>

把生成的类放进项目中，就可以使用了

使用avro把对象序列化到文件中

import org.apache.avro.file.DataFileWriter;
import org.apache.avro.specific.SpecificDatumWriter;
import org.junit.Test;

import java.io.File;

public class TestAvro {

    @Test
    public void write2File() throws Exception {
        SpecificDatumWriter empDatumWriter = new SpecificDatumWriter<WeatherRecord>(WeatherRecord.class);
        //写入文件
        DataFileWriter<WeatherRecord> empFileWriter = new DataFileWriter<WeatherRecord>(empDatumWriter);
        //创建对象
        WeatherRecord w1 = new WeatherRecord();
        w1.setYear(2018);
        w1.setTemperature(20);
        w1.setStationId("CN");

        //串行化数据到磁盘
        empFileWriter.create(w1.getSchema(), new File("/home/hadoop/test/avro/out.avro"));
        empFileWriter.append(w1);
        empFileWriter.append(w1);
        empFileWriter.append(w1);
        empFileWriter.append(w1);
        //关闭流
        empFileWriter.close();
    }
}

从avro格式文件中读取对象

    @Test
    public void read() throws Exception {
        //创建writer对象
        SpecificDatumReader weatherDatumReader = new SpecificDatumReader<WeatherRecord>(WeatherRecord.class);
        //写入文件
        DataFileReader<WeatherRecord> dataReader = new DataFileReader<WeatherRecord>(new File("/home/hadoop/test/avro/out.avro") ,weatherDatumReader);
        Iterator<WeatherRecord> it = dataReader.iterator();
        while(it.hasNext()){
            WeatherRecord w = it.next();
            System.out.println(w.getYear()+" : " + w.getStationId() +" : " + w.getTemperature());
        }
        dataReader.close();
    }

读取结果

fantasticqiang

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop-Apache Avro数据序列化系统

背景 Avro由hadoop之父Doug Cutting开发，旨在解决hadoop中序列化的语言可移植性。是一种能够被多种语言（C,C#,C++,JAVA,PHP,Python,Ruby）处理的数据格式。 Avro模式通常用JSON来写，使用二进制格式进行编码。Avro是自描述语言，数据结构和数据都存在文件中。文件可压缩可切割。Avro中的类型原始数据类型原始数...
复制链接

扫一扫