今天给大家分享一篇关于Greenplum与Deepgreen外部数据加载的小测试。
首先必要的前提条件有:
1)Greenplum 4.3和Deepgreen 16.x安装完毕
2)已经搭建好xdrive环境和gpfdist环境
3)准备测试文件:number.csv 写入一亿条数据。例如:for((i=1;i<100000000;i++));do echo '1,2,3' >> number.csv;done 写入后文件大小:573M
4)将测试文件分别挂载到本地hdfs和本地gpfdist:
hdfs dfs -put /home/hadoop/number.csv /home/hadoop/input
gpfdist -d /home/hadoop -p 8081 &
5)分别创建两个外部表,对应两种方式(xdrive和gpfdist):
create external table number_xdrive(n1 int,n2 int,n3 int) location ('xdrive://localhost:50000/dw/number.csv') format 'csv’;
create external table number_gpfdist(n1 int,n2 int,n3 int) location ('gpfdist://localhost:8081/number.csv') format 'csv’;
6)分别执行一下select limit 10语句查看是否可以访问数据。
测试场景及时间对比:
1.count测试:
1)Deepgreen Xdrive
2)Deepgreen gpfdist
3)Greenplum gpfdist
2.select * 测试:
1)Deepgreen Xdrive
2)Deepgreen gpfdist
3)Greenplum gpfdist
3.group by 测试:
1)Deepgreen Xdrive
2)Deepgreen gpfdist
3)Greenplum gpfdist
4.带where条件的查询:
1)Deepgreen Xdrive &gpfdist
2)Greenplum gpfdist