1.环境准备:
在Avro官网下载Avro的jar文件,以最新版本1.7.4为例,分别下载avro-1.7.4.jar和avro-tool-1.7.4.jar;并下载JSON的jar文件core-asl和mapper-asl。将以上四个文件放入${HADOOP_HOME}/lib目录下(当前为/usr/local/hadoop/lib,为了以后hadoop项目方便)。
2.定义模式(Schema):
编辑如下内容,生成文件user.avsc:
1
2
3
4
5
6
7
8
9
10
|
{
"namespace"
:
"example.avro"
,
"type"
:
"record"
,
"name"
:
"User"
,
"fields"
: [
{
"name"
:
"name"
,
"type"
:
"string"
},
{
"name"
:
"favorite_number"
,
"type"
: [
"int"
,
"null"
]},
{
"name"
:
"favorite_color"
,
"type"
: [
"string"
,
"null"
]}
]
}
|
3.编译模式:
在当前目录下执行如下命令:
1
|
java -jar ${HADOOP_HOME}
/lib/avro-tools-1
.7.4.jar compile schema user.avsc .
|
这时候当前目录下会生成example/avro/User.java目录和文件。
4 编写测试文件
编辑如下内容,生成文件Test.java:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
/**
* @Author wzw
* @Date 2013.07.17
*/
import
java.io.*;
import
java.lang.*;
import
org.apache.avro.io.DatumWriter;
import
org.apache.avro.io.DatumReader;
import
org.apache.avro.specific.SpecificDatumWriter;
import
org.apache.avro.specific.SpecificDatumReader;
import
org.apache.avro.file.DataFileWriter;
import
org.apache.avro.file.DataFileReader;
import
example.avro.User;
public
class
Test {
public
static
void
main(String args[]) {
User user1 =
new
User();
user1.setName(
"Arway"
);
user1.setFavoriteNumber(
3
);
user1.setFavoriteColor(
"green"
);
User user2 =
new
User(
"Ben"
,
7
,
"red"
);
//construct with builder
User user3 = User.newBuilder().setName(
"Charlie"
).setFavoriteColor(
"blue"
).setFavoriteNumber(
100
).build();
//Serialize user1, user2 and user3 to disk
File file =
new
File(
"users.avro"
);
DatumWriter<User> userDatumWriter =
new
SpecificDatumWriter<User>(User.
class
);
DataFileWriter<User> dataFileWriter =
new
DataFileWriter<User>(userDatumWriter);
try
{
dataFileWriter.create(user1.getSchema(),
new
File(
"users.avro"
));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
}
catch
(IOException e) {
}
//Deserialize Users from dist
DatumReader<User> userDatumReader =
new
SpecificDatumReader<User>(User.
class
);
DataFileReader<User> dataFileReader =
null
;
try
{
dataFileReader =
new
DataFileReader<User>(file, userDatumReader);
}
catch
(IOException e) {
}
User user =
null
;
try
{
while
(dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves
// us from allocating and garbage collecting many objects for
// files with many items.
user = dataFileReader.next(user);
System.out.println(user);
}
}
catch
(IOException e) {
}
}
}
|
5.编写编译文件:
编辑如下内容,生成文件compile.sh,注意其中的类路径:
1
2
|
#!/usr/bin/env bash
javac -classpath
/usr/local/hadoop/lib/avro-1
.7.4.jar:
/usr/local/hadoop/lib/avro-tools-1
.7.4.jar:
/usr/local/hadoop/lib/jackson-core-asl-1
.9.13.jar:
/usr/local/hadoop/lib/jackson-mapper-asl-1
.9.13.jar example
/avro/User
.java Test.java
|
6.编写运行文件:
编辑如下内容,生成文件run.sh,注意其中的类路径:
1
2
|
#!/usr/bin/env bash
java -classpath
/usr/local/hadoop/lib/avro-1
.7.4.jar:
/usr/local/hadoop/lib/avro-tools-1
.7.4.jar:
/usr/local/hadoop/lib/jackson-core-asl-1
.9.13.jar:
/usr/local/hadoop/lib/jackson-mapper-asl-1
.9.13.jar:User.jar:. Test
|
7.测试:
(1).编译:
运行compile.sh脚本,编译example/avro/User.java和Test.java文件,生成对应的类文件。
(2).打包User类文件:
jar cvf ./example .
(2).运行:
运行run.sh脚本,查看程序输出结果。
(3).查看avro序列化效果:
在Test.java的写入部分添加一个for循环,多写一些user(如100次)到user.avro,然后把run.sh的输出结果存储到纯文本中user.plain中,观察user.avro和user.plain的大小:
1
2
3
4
|
-rw-r--r-- 1 hadoop hadoop 245 2013-07-17 17:18 user.avsc
-rw-r--r-- 1 hadoop hadoop 5486 2013-07-17 18:39 User.jar
-rw-r--r-- 1 hadoop hadoop 1737 2013-07-17 19:11 users.avro
-rw-r--r-- 1 hadoop hadoop 6892 2013-07-17 19:12 users.plain
|
由以上输出结果可以对avro的序列化功能有一个直观感受。
参考资料:
http://avro.apache.org/docs/1.7.4/gettingstartedjava.html
http://blog.csdn.net/zhumin726/article/details/8467805
wzw0114
2013.07.17