参考文档:
Hive基准测试神器-hive-testbench_shining_yyds的博客-CSDN博客
GitHub - hortonworks/hive-testbench
hive tpcds-benchmark 测试_houzhizhen的博客-CSDN博客
1. 构造测试数据
1.1
数据单位为G,最小的数据大小为2,构造数据的命令如下:
cd hive-testbench
./tpcds-setup.sh 2
默认数据格式为ORC,想要切换格式可以执行
cd hive-testbench
FORMAT=textfile ./tpcds-setup.sh 2
or
cd hive-testbench
FORMAT=parquet ./tpcds-setup.sh 2
如果数据是on bos的则需要提前新建一个数据库再执行tpcds-setup.sh,
create database tpcds_bos_orc_3 location "bos://test/**/tpcds_bos_orc_3.db"
原代码是:
DATABASE=tpcds_hdfs_${FORMAT}_${SCALE}
改造之后增加了一个location的参数:BOS_OR_HDFS,部分代码如下:参数:
SCALE:数据量大小
BOS_OR_HDFS:存储位置 hdfs or bos
DIR:存储路径一般默认即可,hdfs 路径就是/tmp/tpcds-generate;bos的路径就是建集群时的location路径 bos://test/**/tpcds_bos_orc_3.db
# Get the parameters.
SCALE=$1
BOS_OR_HDFS=$2
DIR=$3
if [ "X$BUCKET_DATA" != "X" ]; then
BUCKETS=13
RETURN_BUCKETS=13
else
BUCKETS=1
RETURN_BUCKETS=1
fi
if [ "X$DEBUG_SCRIPT" != "X" ]; then
set -x
fi
i=1
total=24
DATABASE=tpcds_${BOS_OR_HDFS}_${FORMAT}_${SCALE}
MAX_REDUCERS=2500 # maximum number of useful reducers for any scale
REDUCERS=$((test ${SCALE} -gt ${MAX_REDUCERS} && echo ${MAX_REDUCERS}) || echo ${SCALE})
echo before for t in ${DIMS}
所以最后的构造数据命令为:
cd hive-testbench
FORMAT=parquet ./tpcds-setup.sh 3 bos/hdfs
2. 执行查询测试
2.1 执行单个sql
hive -i /path/hive-testbench/sample-queries-tpcds/testbench.settings -e "use tpcds_bos_orc_3; source /home/hive/hive-testbench/sample-queries-tpcds/query{}.sql"
执行sql的日志可以查看ls sample-queries-tpcds/ 对应的query*.sql.log
2.2 批量执行sql
可以通过runSuite.pl 文件然后重定向到一个日志文件
这里也修改了下执行文件,$scale代表数据库
源码:
my @queries = glob '*.sql';
my $db = {
'tpcds' => "tpcds_bin_partitioned_orc_$scale",
'tpch' => "tpch_flat_orc_$scale"
};
修改为:
my @queries = glob '*.sql';
my $db = {
#'tpcds' => "tpcds_bin_partitioned_orc_$scale",
'tpcds' => "$scale",
'tpch' => "tpch_flat_orc_$scale"
};
perl runSuite.pl tpcds tpcds_hdfs_orc_2 orc "/opt/bmr/hive/bin/beeline -u jdbc:hive2://172.18.0.20:10000" > /tmp/tpcds.log 2>&1
最后的结果文件可以查看/tmp/tpcds.log
输出的结果类似如下:
[hive@bmr-master-89044c9 hive-testbench]$ perl runSuite.pl tpcds tpcds_hdfs_orc_2 orc "/opt/bmr/hive/bin/beeline -u jdbc:hive2://172.18.0.20:10000"
filename,status,time,rows
query10.sql,failed,65
query11.sql,success,249,100
query12.sql,success,98,100
query13.sql,success,79,1
query14.sql,failed,16
query15.sql,success,76,100
...