一.开始创建三种格式的表:
create table rcfile (name string,age int,addr string,desc string) row format delimited fields terminated by ',' stored as rcfile;
create table rcfile (name string,age int,addr string,desc string) row format delimited fields terminated by ',' stored as orcfile;
create table rcfile (name string,age int,addr string,desc string) row format delimited fields terminated by ',' stored as parquetfile;
- 1
- 2
- 3
二.用shell生成1000W条数据,以”,”隔开,并且load data overwrite到 textfile表里面
三.分别把数据insert到三个表中:
insert into rcfile select * from lijie.textfile;
insert into orcfile select * from lijie.textfile;
insert into parquetfile select * from lijie.textfile;
- 1
- 2
- 3
四.开始测试
1.select * from xxfile
rcfile Time taken: 47.604 seconds, Fetcheds 13756317 row(s)
orcfile Time taken: 2.563 seconds, Fetcheds 13756317 row(s)
parquetfile Time taken: 43.454 seconds, Fetcheds 13756317 row(s)
结论orcfile 小于 rcfile 小于 parquet
2.select name,addr from xxfile
rcfile Time taken: 36.937 seconds, Fetcheds 13756317 row(s)
orcfile Time taken: 2.514 seconds, Fetcheds 13756317 row(s)
parquetfile Time taken: 43.454 seconds, Fetcheds 13756317 row(s)
结论orcfile 小于 rcfile 小于 parquet
3.select max(name) from xxfile
rcfile Time taken: 34.375 seconds, Fetcheds 13756317 row(s)
orcfile Time taken: 30.073 seconds, Fetcheds 13756317 row(s)
parquetfile Time taken: 38.352 seconds, Fetcheds 13756317 row(s)
结论orcfile 小于 rcfile 小于 parquet
4.select count(1) from xxfile
rcfile Time taken: 32.261 seconds, Fetcheds 13756317 row(s)
orcfile Time taken: 28.959 seconds, Fetcheds 13756317 row(s)
parquetfile Time taken: 32.265 seconds, Fetcheds 13756317 row(s)
结论orcfile 小于 rcfile=parquet
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
五.总结
总数据量13756317
列:name,age,addr,desc
orcfile 查询效果更优,rcfile效果略好于parquetfile