使用TPC-DS对kudu进行基准测试
目录
文章目录
使用TPC-DS工具对kudu列式存储数据库进行一个完整的自动化基准测试。
前置条件:CDH整套环境已经搭建好,登录Impala Daemon节点执行。
操作步骤:
1.安装TPC-DS工具
2.初始化测试目录,使用TPC-DS工具生成测试数据dat文件和查询语句。
3.在HDFS新建目录,将测试数据dat文件上传到HDFS中。
4.将TPC-DS提供的DDL语句做语法兼容改造。
5.准备impala外部表和kudu表的DDL、COUNT、LIMIT语句,整理成SQL文件。准备统计分析(computer status table_name)的SQL语句。
6.调用start_tpc_import.sh进行impala外部表的建表,执行kudu内部表ddl,将impala外部表数据导入到kudu内部表,调用统计分析SQL,收集执行结果信息到tpc_ds_test/result目录查看,具体的SQL文件执行日志在tpc_ds_test/logs目录查看。
7.整理TPC-DS提供的SQL基准查询语句,由于有些语法不兼容kudu,需要进行手动调整,最后整理到单独SQL文件中。
8.调用start_kudu_query.sh,执行手动调整过后的TPC—DS的SQL基准查询。收集执行结果信息到tpc_ds_test/result目录查看,具体的SQL文件执行日志在tpc_ds_test/logs目录查看。
自动化测试脚本地址:https://github.com/linmeibao/tpc_ds_test
TPC-DS介绍
TPC-DS是一个面向决策支持系统(decision support system)的包含多维度常规应用模型的决策支持基准,包括查询(queries)与数据维护。此基准对被测系统(System Under Test’s, SUT)在决策支持系统层面上的表现进行的评估具有代表性。
此基准体现决策支持系统以下特性:
1.测试大规模数据
2.对实际商业问题进行解答
3.执行需求多样或复杂的查询(如临时查询,报告,迭代OLAP,数据挖掘)
4.以高CPU和IO负载为特征
5.通过数据库维护对OLTP数据库资源进行周期同步
6.解决大数据问题,如关系型数据库(RDBMS),或基于Hadoop/Spark的系统
基准结果用来测量,较为复杂的多用户决策中,单一用户模型下的查询响应时间,多用户模型下的查询吞吐量,以及数据维护表现。
TPC-DS采用星型、雪花型等多维数据模式。它包含7张事实表,17张纬度表平均每张表含有18列。其工作负载包含99个SQL查询,覆盖SQL99和2003的核心部分以及OLAP。这个测试集包含对大数据集的统计、报表生成、联机查询、数据挖掘等复杂应用,测试用的数据和值是有倾斜的,与真实数据一致。可以说TPC-DS是与真实场景非常接近的一个测试集,也是难度较大的一个测试集。
kudu介绍
Kudu 是一个针对 Apache Hadoop 平台而开发的列式存储管理器。Kudu 共享 Hadoop 生态系统应用的常见技术特性: 它在 commodity hardware(商品硬件)上运行,horizontally scalable(水平可扩展),并支持 highly available(高可用)性操作。
安装TPC-DS工具
1.下载安装包,http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
。必须输入邮箱,他会发下载地址到邮箱中,点击下载地址即可下载。
2.在linux服务器进行解压:unzip 8fb68f0a-33f8-40be-96b4-322eb3e88af5-tpc-ds-tool.zip
drwxr-xr-x 10 hsy staff 320B 5 25 16:13 v2.13.0rc1
➜ cd v2.13.0rc1 && ll
total 40
-rw-r--r-- 1 hsy staff 17K 5 25 15:27 EULA.txt
drwxr-xr-x 131 hsy staff 4.1K 5 25 15:28 answer_sets
drwxr-xr-x 109 hsy staff 3.4K 5 25 15:27 query_templates
drwxr-xr-x 17 hsy staff 544B 5 25 15:27 query_variants
drwxr-xr-x 4 hsy staff 128B 5 25 15:27 specification
drwxr-xr-x 70 hsy staff 2.2K 5 25 15:27 tests
drwxr-xr-x 364 hsy staff 11K 5 25 15:26 tools
3.安装TPC-DS编译依赖环境
yum -y install gcc gcc-c++ expect
4.进入tools目录进行编译:make
初始化目录,提前准备sh脚本,使用TPC-DS工具生成测试数据和基准查询语句
1.新建一个测试目录,新建tpc_test_init.sh,进行初始化目录,这里使用/root/目录
vim /root/tpc_test_init.sh
将👇的脚本复制到tpc_test_init.sh中
#!/bin/bash
# 防止误操作,先注释,手动更改一下测试目录,再把#basedir="/root/"的注释打开
#basedir="/root/"
cd $basedir
mkdir -p tpc_ds_test/{bin,logs,query_data,result,sql}
cd tpc_ds_test
mkdir -p logs/{ext,import,kudu,query}
mkdir -p sql/{ext,import,kudu,query}
touch ${basedir}tpc_ds_test/bin/batch_upload_to_hdfs.sh
touch ${basedir}tpc_ds_test/bin/start_tpc_import.sh
touch ${basedir}tpc_ds_test/bin/start_kudu_query.sh
chmod +x ${basedir}tpc_ds_test/bin/batch_upload_to_hdfs.sh
chmod +x ${basedir}tpc_ds_test/bin/start_tpc_import.sh
chmod +x ${basedir}tpc_ds_test/bin/start_kudu_query.sh
echo "Directory initialization is complete..."
2.执行初始化
chmod +x tpc_test_init.sh && sh tpc_test_init.sh
3.手动将TPC-DS的工具包移动到 /root/tpc_ds_test 下,解压
cp /root/8fb68f0a-33f8-40be-96b4-322eb3e88af5-tpc-ds-tool.zip /root/tpc_ds_test
unzip /root/tpc_ds_test/8fb68f0a-33f8-40be-96b4-322eb3e88af5-tpc-ds-tool.zip
5.工具说明
dsdgen 生成数据
[root@test4 tools]# ./dsdgen -h
dsdgen Population Generator (Version 2.13.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2020
USAGE: dsdgen [options]
Note: When defined in a parameter file (using -p), parmeters should
use the form below. Each option can also be set from the command
line, using a form of '-param [optional argument]'
Unique anchored substrings of options are also recognized, and
case is ignored, so '-sc' is equivalent to '-SCALE'
General Options
===============
ABREVIATION = <s> -- build table with abreviation <s>
DIR = <s> -- generate tables in directory <s>
HELP = <n> -- display this message
PARAMS = <s> -- read parameters from file <s>
QUIET = [Y|N] -- disable all output to stdout/stderr
SCALE = <n> -- volume of data to generate in GB
TABLE = <s> -- build only table <s>
UPDATE = <n> -- generate update data set <n>
VERBOSE = [Y|N] -- enable verbose output
PARALLEL = <n> -- build data in <n> separate chunks
CHILD = <n> -- generate <n>th chunk of the parallelized data
RELEASE = [Y|N] -- display the release information
_FILTER = [Y|N] -- output data to stdout
VALIDATE = [Y|N] -- produce rows for data validation
Advanced Options
===============
DELIMITER = <s> -- use <s> as output field separator
DISTRIBUTIONS = <s> -- read distributions from file <s>
FORCE = [Y|N] -- over-write data files without prompting
SUFFIX = <s> -- use <s> as output file suffix
TERMINATE = [Y|N] -- end each record with a field delimiter
VCOUNT = <n> -- set number of validation rows to be produced
VSUFFIX = <s> -- set file suffix for data validation
RNGSEED = <n> -- set RNG seed
dsqgen 生成查询语句
[root@test4 tools]# ./dsqgen -h
qgen2 Population Generator (Version 2.13.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2020
USAGE: qgen2 [options]
Note: When defined in a parameter file (using -p), parmeters should
use the form below. Each option can also be set from the command
line, using a form of '-param [optional argument]'
Unique anchored substrings of options are also recognized, and
case is ignored, so '-sc' is equivalent to '-SCALE'
General Options
===============
FILE = <s> -- read parameters from file <s>
VERBOSE = [Y|N] -- enable verbose output
HELP = [Y|N] -- display this message
OUTPUT_DIR = <s> -- write query streams into directory <s>
QUIET = [Y|N] -- suppress all output (for scripting)
STREAMS = <n> -- generate <n> query streams/versions
INPUT = <s> -- read template names from <s>
SCALE = <n> -- assume a database of <n> GB
LOG = <s> -- write parameter log to <s>
QUALIFY = [Y|N] -- generate qualification queries in ascending order
Advanced Options
===============
DISTRIBUTIONS = <s> -- read distributions from file <s>
PATH_SEP = <s> -- use <s> to separate path elements
RNGSEED = <n> -- seed the RNG with <n>
RELEASE = [Y|N] -- display QGEN release info
TEMPLATE = <s> -- build queries from template <s> ONLY
COUNT = <n> -- generate <n> versions per stream (used with TEMPLATE)
DEBUG = [Y|N] -- minor debugging outptut
FILTER = [Y|N] -- write generated queries to stdout
DIALECT = <s> -- include query dialect defintions found in <s>.tpl
DIRECTORY = <s> -- look in <s> for templates
6.创建数据存放目录,可自行定义,这里使用/root/test_datas
mkdir /root/test_datas
7.生成测试数据
产生1G测试数据
cd /root/tpc_ds_test/v2.13.0rc1/tools && ./dsdgen -DIR /root/test_datas -SCALE 1 -TERMINATE N
并行产生1G数据
cd /root/tpc_ds_test/v2.13.0rc1/tools
./dsdgen -DIR /root/test_datas -SCALE 10 -parallel 4 -child 1
如果数据目录未创建(mkdir /root/test_datas)就执行生成数据命令,则会出现以下报错。所以需要将目录提前创建好。
[root@test4 tpc_ds_test]# cd /root/tpc_ds_test/v2.13.0rc1/tools && ./dsdgen -DIR /root/test_datas -SCALE 1 -TERMINATE N
dsdgen Population Generator (Version 2.13.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2020
Warning: This scale factor is valid for QUALIFICATION ONLY
ERROR: Failed to open output file!
File: print.c
Line: 490
8.生成查询语句
- dialect这里选择netezza,SQL查询分页语法与要测试的kudu语句一致(可选:oracel、netezza、sqlserver、db2)netezza分页语法是limit所以这里选择它)
cd /root/tpc_ds_test/v2.13.0rc1/tools
./dsqgen -output_dir /root/tpc_ds_test/query_data -input ../query_templates/templates.lst -scale 1 -dialect netezza -directory ../query_templates
- /root/tpc_ds_test/query_data在目录初始化的时候已经生成了
- 查看/root/tpc_ds_test/query_data目录下生成的query_0.sql
TPC—DS生成测试数据文件批量上传至HDFS
1.复制下面的脚本到 batch_upload_to_hdfs.sh 进行批量上传文件到HDFS
batch_upload_to_hdfs.sh在初始化目录时已经创建完成
cd /root/tpc_ds_test/bin && vim batch_upload_to_hdfs.sh
#!/bin/bash
#文件存放的目录
data_src_dir=/root/test_datas/
#文件上传到hdfs的根路径
hdfs_root_dir=/tmp/test/
file_dir=""
#创建测试数据目录
hdfs dfs -mkdir $hdfs_root_dir
ls $data_src_dir | while read fileName
do
file_dir=${fileName%.*}
echo "${file_dir}..."
hdfs dfs -mkdir $hdfs_root_dir$file_dir
echo "${fileName}..."
hdfs dfs -put $data_src_dir$fileName $hdfs_root_dir$file_dir/$fileName
done
echo "The upload to HDFS is complete..."
2.执行脚本
sh batch_upload_to_hdfs.sh
3.查看目录,是否上传成功
hdfs dfs -ls h /tmp/test/
[root@test4 bin]# hdfs dfs -ls h /tmp/test/
ls: `h': No such file or directory
Found 27 items
drwxr-xr-x - root supergroup 0 2020-06-10 01:28 /tmp/test/call_center
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/catalog_page
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/catalog_returns
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/catalog_sales
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/customer
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/customer_address
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/customer_demographics
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/date_dim
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/dbgen_version
drwxr-xr-x - root supergroup 0 2020-06-10 01:29 /tmp/test/handled
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/handled_data
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/household_demographics
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/income_band
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/inventory
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/item
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/promotion
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/reason
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/ship_mode
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/store
drwxr-xr-x - root supergroup 0 2020-06-10 01:30 /tmp/test/store_returns
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/store_sales
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/time_dim
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/warehouse
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/web_page
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/web_returns
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/web_sales
drwxr-xr-x - root supergroup 0 2020-06-10 01:31 /tmp/test/web_site
[root@test4 bin]# hdfs dfs -ls h /tmp/test/call_center
ls: `h': No such file or directory
Found 1 items
-rw-r--r-- 3 root supergroup 1891 2020-06-10 01:28 /tmp/test/call_center/call_center.dat
[root@test4 bin]# hdfs dfs -ls h /tmp/test/catalog_page
ls: `h': No such file or directory
Found 1 items
-rw-r--r-- 3 root supergroup 1631792 2020-06-10 01:29 /tmp/test/catalog_page/catalog_page.dat
[root@test4 bin]#
TPC-DS提供DDL语句与impala和kudu建表语法有差异,需要手动改造
TPC-DS已经提前准备好了创建表相关的SQL文件,文件位于tools目录下。
但是由于各种数据库之间语法差异,无法直接使用,需要手动改造。
- tpcds.sql 创建25张表的sql语句
- tpcds_ri.sql 创建表与表之间关系的sql语句
- tpcds_source.sql
- 主要是用到tpcds.sql文件
测试数据已经导入到HDFS,可以开始创建impala外部表,随后使用“INSERT INTO kudu表 SELECT * FROM 外部表”的方式,将测试数据导入到kudu当中。
1.以下为改造后兼容impala语法的外部表DDL
use tpc_test;
DROP TABLE IF EXISTS ext_dbgen_version;
create external table ext_dbgen_version
(
dv_version varchar(30) ,
dv_create_date TIMESTAMP ,
dv_create_time varchar(30) ,
dv_cmdline_args varchar(200)
)row format delimited fields terminated by '|'
location '/tmp/test/dbgen_version';
DROP TABLE IF EXISTS ext_customer_address;
create external table ext_customer_address
(
ca_address_sk integer ,
ca_address_id char(16) ,
ca_street_number char(10) ,
ca_street_name varchar(60) ,
ca_street_type char(15) ,
ca_suite_number char(10) ,
ca_city varchar(60) ,
ca_county varchar(30) ,
ca_state char(2) ,
ca_zip char(10) ,
ca_country varchar(20) ,
ca_gmt_offset decimal(5,2) ,
ca_location_type char(20)
)row format delimited fields terminated by '|'
location '/tmp/test/customer_address';
DROP TABLE IF EXISTS ext_customer_demographics;
create external table ext_customer_demographics
(
cd_demo_sk integer ,
cd_gender char(1) ,
cd_marital_status char(1) ,
cd_education_status char(20) ,
cd_purchase_estimate integer ,
cd_credit_rating char(10) ,
cd_dep_count integer ,
cd_dep_employed_count integer ,
cd_dep_college_count integer
)row format delimited fields terminated by '|'
location '/tmp/test/customer_demographics';
DROP TABLE IF EXISTS ext_date_dim;
create external table ext_date_dim
(
d_date_sk integer ,
d_date_id char(16) ,
d_date TIMESTAMP ,
d_month_seq integer ,
d_week_seq integer ,
d_quarter_seq integer ,
d_year integer ,
d_dow integer ,
d_moy integer ,
d_dom integer ,
d_qoy integer ,
d_fy_year integer ,
d_fy_quarter_seq integer ,
d_fy_week_seq integer ,
d_day_name char(9) ,
d_quarter_name char(6) ,
d_holiday char(1) ,
d_weekend char(1) ,
d_following_holiday char(1) ,
d_first_dom integer ,
d_last_dom integer ,
d_same_day_ly integer ,
d_same_day_lq integer ,
d_current_day char(1) ,
d_current_week char(1) ,
d_current_month char(1) ,
d_current_quarter char(1) ,
d_current_year char(1)
)row format delimited fields terminated by '|'
location '/tmp/test/date_dim';
DROP TABLE IF EXISTS ext_warehouse;
create external table ext_warehouse
(
w_warehouse_sk integer ,
w_warehouse_id char(16) ,
w_warehouse_name varchar(20) ,
w_warehouse_sq_ft integer ,
w_street_number char(10) ,
w_street_name varchar(60) ,
w_street_type char(15) ,
w_suite_number char(10) ,
w_city varchar(60) ,
w_county varchar(30) ,
w_state char(2) ,
w_zip char(10) ,
w_country varchar(20) ,
w_gmt_offset decimal(5,2)
)row format delimited fields terminated by '|'
location '/tmp/test/warehouse';
DROP TABLE IF EXISTS ext_ship_mode;
create external table ext_ship_mode
(
sm_ship_mode_sk integer ,
sm_ship_mode_id char(16) ,
sm_type char(30) ,
sm_code char(10) ,
sm_carrier char(20) ,
sm_contract char(20)
)row format delimited fields terminated by '|'
location '/tmp/test/ship_mode';
DROP TABLE IF EXISTS ext_time_dim;
create external table ext_time_dim
(
t_time_sk integer ,
t_time_id char(16) ,
t_time integer ,
t_hour integer ,
t_minute integer ,
t_second integer ,
t_am_pm char(2) ,
t_shift char(20) ,
t_sub_shift char(20) ,
t_meal_time char(20)
)row format delimited fields terminated by '|'
location '/tmp/test/time_dim';
DROP TABLE IF EXISTS ext_reason;
create external table ext_reason
(
r_reason_sk integer ,
r_reason_id char(16) ,
r_reason_desc char(100)
)row format delimited fields terminated by '|'
location '/tmp/test/reason';
DROP TABLE IF EXISTS ext_income_band;
create external table ext_income_band
(
ib_income_band_sk integer ,
ib_lower_bound integer ,
ib_upper_bound integer
)row format delimited fields terminated by '|'
location '/tmp/test/income_band';
DROP TABLE IF EXISTS ext_item;
create external table ext_item
(
i_item_sk integer ,
i_item_id char(16) ,
i_rec_start_date TIMESTAMP ,
i_rec_end_date TIMESTAMP ,
i_item_desc varchar(200) ,
i_current_price decimal(7,2) ,
i_wholesale_cost decimal(7,2) ,
i_brand_id integer ,
i_brand char(50) ,
i_class_id integer ,
i_class char(50) ,
i_category_id integer ,
i_category char(50) ,
i_manufact_id integer ,
i_manufact char(50) ,
i_size char(20) ,
i_formulation char(20) ,
i_color char(20) ,
i_units char(10) ,
i_container char(10) ,
i_manager_id integer ,
i_product_name char(50)
)row format delimited fields terminated by '|'
location '/tmp/test/item';
DROP TABLE IF EXISTS ext_store;
create external table ext_store
(
s_store_sk integer ,
s_store_id char(16) ,
s_rec_start_date TIMESTAMP ,
s_rec_end_date TIMESTAMP ,
s_closed_date_sk integer ,
s_store_name varchar(50) ,
s_number_employees integer ,
s_floor_space integer ,
s_hours char(20) ,
s_manager varchar(40) ,
s_market_id integer ,
s_geography_class varchar(100) ,
s_market_desc varchar(100) ,
s_market_manager varchar(40) ,
s_division_id integer ,
s_division_name varchar(50) ,
s_company_id integer ,
s_company_name varchar(50) ,
s_street_number varchar(10) ,
s_street_name varchar(60) ,
s_street_type char(15) ,
s_suite_number char(10) ,
s_city varchar(60) ,
s_county varchar(30) ,
s_state char(2) ,
s_zip char(10) ,
s_country varchar(20) ,
s_gmt_offset decimal(5,2) ,
s_tax_precentage decimal(5,2)
)row format delimited fields terminated by '|'
location '/tmp/test/store';
DROP TABLE IF EXISTS ext_call_center;
create external table ext_call_center
(
cc_call_center_sk integer ,
cc_call_center_id char(16) ,
cc_rec_start_date TIMESTAMP ,
cc_rec_end_date TIMESTAMP ,
cc_closed_date_sk integer ,
cc_open_date_sk integer ,
cc_name varchar(50) ,
cc_class varchar(50) ,
cc_employees integer ,
cc_sq_ft integer ,
cc_hours char(20) ,
cc_manager varchar(40) ,
cc_mkt_id integer ,
cc_mkt_class char(50) ,
cc_mkt_desc varchar(100) ,
cc_market_manager varchar(40) ,
cc_division integer ,
cc_division_name varchar(50) ,
cc_company integer ,
cc_company_name char(50) ,
cc_street_number char(10) ,
cc_street_name varchar(60) ,
cc_street_type char(15) ,
cc_suite_number char(10) ,
cc_city varchar(60) ,
cc_county varchar(30) ,
cc_state char(2) ,
cc_zip char(10) ,
cc_country varchar(20) ,
cc_gmt_offset decimal(5,2) ,
cc_tax_percentage decimal(5,2)
)row format delimited fields terminated by '|'
location '/tmp/test/call_center';
DROP TABLE IF EXISTS ext_customer;
create external table ext_customer
(
c_customer_sk integer ,
c_customer_id char(16) ,
c_current_cdemo_sk integer ,
c_current_hdemo_sk integer ,
c_current_addr_sk integer ,
c_first_shipto_date_sk integer ,
c_first_sales_date_sk integer ,
c_salutation char(10) ,
c_first_name char(20) ,
c_last_name char(30) ,
c_preferred_cust_flag char(1) ,
c_birth_day integer ,
c_birth_month integer ,
c_birth_year integer ,
c_birth_country varchar(20) ,
c_login char(13) ,
c_email_address char(50) ,
c_last_review_date char(10)
)row format delimited fields terminated by '|'
location '/tmp/test/customer';
DROP TABLE IF EXISTS ext_web_site;
create external table ext_web_site
(
web_site_sk integer ,
web_site_id char(16) ,
web_rec_start_date TIMESTAMP ,
web_rec_end_date TIMESTAMP ,
web_name varchar(50) ,
web_open_date_sk integer ,
web_close_date_sk integer ,
web_class varchar(50) ,
web_manager varchar(40) ,
web_mkt_id integer ,
web_mkt_class varchar(50) ,
web_mkt_desc varchar(100) ,
web_market_manager varchar(40) ,
web_company_id integer ,
web_company_name char(50) ,
web_street_number char(10) ,
web_street_name varchar(60) ,
web_street_type char(15) ,
web_suite_number char(10) ,
web_city varchar(60) ,
web_county varchar(30) ,
web_state char(2) ,
web_zip char(10) ,
web_country varchar(20) ,
web_gmt_offset decimal(5,2) ,
web_tax_percentage decimal(5,2)
)row format delimited fields terminated by '|'
location '/tmp/test/web_site';
DROP TABLE IF EXISTS ext_store_returns;
create external table ext_store_returns
(
sr_returned_date_sk integer ,
sr_return_time_sk integer ,
sr_item_sk integer ,
sr_customer_sk integer ,
sr_cdemo_sk integer ,
sr_hdemo_sk integer ,
sr_addr_sk integer ,
sr_store_sk integer ,
sr_reason_sk integer ,
sr_ticket_number integer ,
sr_return_quantity integer ,
sr_return_amt decimal(7,2) ,
sr_return_tax decimal(7,2) ,
sr_return_amt_inc_tax decimal(7,2) ,
sr_fee decimal(7,2) ,
sr_return_ship_cost decimal(7,2) ,
sr_refunded_cash decimal(7,2) ,
sr_reversed_charge decimal(7,2) ,
sr_store_credit decimal(7,2) ,
sr_net_loss decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/store_returns';
DROP TABLE IF EXISTS ext_household_demographics;
create external table ext_household_demographics
(
hd_demo_sk integer ,
hd_income_band_sk integer ,
hd_buy_potential char(15) ,
hd_dep_count integer ,
hd_vehicle_count integer
)row format delimited fields terminated by '|'
location '/tmp/test/household_demographics';
DROP TABLE IF EXISTS ext_web_page;
create external table ext_web_page
(
wp_web_page_sk integer ,
wp_web_page_id char(16) ,
wp_rec_start_date TIMESTAMP ,
wp_rec_end_date TIMESTAMP ,
wp_creation_date_sk integer ,
wp_access_date_sk integer ,
wp_autogen_flag char(1) ,
wp_customer_sk integer ,
wp_url varchar(100) ,
wp_type char(50) ,
wp_char_count integer ,
wp_link_count integer ,
wp_image_count integer ,
wp_max_ad_count integer
)row format delimited fields terminated by '|'
location '/tmp/test/web_page';
DROP TABLE IF EXISTS ext_promotion;
create external table ext_promotion
(
p_promo_sk integer ,
p_promo_id char(16) ,
p_start_date_sk integer ,
p_end_date_sk integer ,
p_item_sk integer ,
p_cost decimal(15,2) ,
p_response_target integer ,
p_promo_name char(50) ,
p_channel_dmail char(1) ,
p_channel_email char(1) ,
p_channel_catalog char(1) ,
p_channel_tv char(1) ,
p_channel_radio char(1) ,
p_channel_press char(1) ,
p_channel_event char(1) ,
p_channel_demo char(1) ,
p_channel_details varchar(100) ,
p_purpose char(15) ,
p_discount_active char(1)
)row format delimited fields terminated by '|'
location '/tmp/test/promotion';
DROP TABLE IF EXISTS ext_catalog_page;
create external table ext_catalog_page
(
cp_catalog_page_sk integer ,
cp_catalog_page_id char(16) ,
cp_start_date_sk integer ,
cp_end_date_sk integer ,
cp_department varchar(50) ,
cp_catalog_number integer ,
cp_catalog_page_number integer ,
cp_description varchar(100) ,
cp_type varchar(100)
)row format delimited fields terminated by '|'
location '/tmp/test/catalog_page';
DROP TABLE IF EXISTS ext_inventory;
create external table ext_inventory
(
inv_date_sk integer ,
inv_item_sk integer ,
inv_warehouse_sk integer ,
inv_quantity_on_hand integer
)row format delimited fields terminated by '|'
location '/tmp/test/inventory';
DROP TABLE IF EXISTS ext_catalog_returns;
create external table ext_catalog_returns
(
cr_returned_date_sk integer ,
cr_returned_time_sk integer ,
cr_item_sk integer ,
cr_refunded_customer_sk integer ,
cr_refunded_cdemo_sk integer ,
cr_refunded_hdemo_sk integer ,
cr_refunded_addr_sk integer ,
cr_returning_customer_sk integer ,
cr_returning_cdemo_sk integer ,
cr_returning_hdemo_sk integer ,
cr_returning_addr_sk integer ,
cr_call_center_sk integer ,
cr_catalog_page_sk integer ,
cr_ship_mode_sk integer ,
cr_warehouse_sk integer ,
cr_reason_sk integer ,
cr_order_number integer ,
cr_return_quantity integer ,
cr_return_amount decimal(7,2) ,
cr_return_tax decimal(7,2) ,
cr_return_amt_inc_tax decimal(7,2) ,
cr_fee decimal(7,2) ,
cr_return_ship_cost decimal(7,2) ,
cr_refunded_cash decimal(7,2) ,
cr_reversed_charge decimal(7,2) ,
cr_store_credit decimal(7,2) ,
cr_net_loss decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/catalog_returns';
DROP TABLE IF EXISTS ext_web_returns;
create external table ext_web_returns
(
wr_returned_date_sk integer ,
wr_returned_time_sk integer ,
wr_item_sk integer ,
wr_refunded_customer_sk integer ,
wr_refunded_cdemo_sk integer ,
wr_refunded_hdemo_sk integer ,
wr_refunded_addr_sk integer ,
wr_returning_customer_sk integer ,
wr_returning_cdemo_sk integer ,
wr_returning_hdemo_sk integer ,
wr_returning_addr_sk integer ,
wr_web_page_sk integer ,
wr_reason_sk integer ,
wr_order_number integer ,
wr_return_quantity integer ,
wr_return_amt decimal(7,2) ,
wr_return_tax decimal(7,2) ,
wr_return_amt_inc_tax decimal(7,2) ,
wr_fee decimal(7,2) ,
wr_return_ship_cost decimal(7,2) ,
wr_refunded_cash decimal(7,2) ,
wr_reversed_charge decimal(7,2) ,
wr_account_credit decimal(7,2) ,
wr_net_loss decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/web_returns';
DROP TABLE IF EXISTS ext_web_sales;
create external table ext_web_sales
(
ws_sold_date_sk integer ,
ws_sold_time_sk integer ,
ws_ship_date_sk integer ,
ws_item_sk integer ,
ws_bill_customer_sk integer ,
ws_bill_cdemo_sk integer ,
ws_bill_hdemo_sk integer ,
ws_bill_addr_sk integer ,
ws_ship_customer_sk integer ,
ws_ship_cdemo_sk integer ,
ws_ship_hdemo_sk integer ,
ws_ship_addr_sk integer ,
ws_web_page_sk integer ,
ws_web_site_sk integer ,
ws_ship_mode_sk integer ,
ws_warehouse_sk integer ,
ws_promo_sk integer ,
ws_order_number integer ,
ws_quantity integer ,
ws_wholesale_cost decimal(7,2) ,
ws_list_price decimal(7,2) ,
ws_sales_price decimal(7,2) ,
ws_ext_discount_amt decimal(7,2) ,
ws_ext_sales_price decimal(7,2) ,
ws_ext_wholesale_cost decimal(7,2) ,
ws_ext_list_price decimal(7,2) ,
ws_ext_tax decimal(7,2) ,
ws_coupon_amt decimal(7,2) ,
ws_ext_ship_cost decimal(7,2) ,
ws_net_paid decimal(7,2) ,
ws_net_paid_inc_tax decimal(7,2) ,
ws_net_paid_inc_ship decimal(7,2) ,
ws_net_paid_inc_ship_tax decimal(7,2) ,
ws_net_profit decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/web_sales';
DROP TABLE IF EXISTS ext_catalog_sales;
create external table ext_catalog_sales
(
cs_sold_date_sk integer ,
cs_sold_time_sk integer ,
cs_ship_date_sk integer ,
cs_bill_customer_sk integer ,
cs_bill_cdemo_sk integer ,
cs_bill_hdemo_sk integer ,
cs_bill_addr_sk integer ,
cs_ship_customer_sk integer ,
cs_ship_cdemo_sk integer ,
cs_ship_hdemo_sk integer ,
cs_ship_addr_sk integer ,
cs_call_center_sk integer ,
cs_catalog_page_sk integer ,
cs_ship_mode_sk integer ,
cs_warehouse_sk integer ,
cs_item_sk integer ,
cs_promo_sk integer ,
cs_order_number integer ,
cs_quantity integer ,
cs_wholesale_cost decimal(7,2) ,
cs_list_price decimal(7,2) ,
cs_sales_price decimal(7,2) ,
cs_ext_discount_amt decimal(7,2) ,
cs_ext_sales_price decimal(7,2) ,
cs_ext_wholesale_cost decimal(7,2) ,
cs_ext_list_price decimal(7,2) ,
cs_ext_tax decimal(7,2) ,
cs_coupon_amt decimal(7,2) ,
cs_ext_ship_cost decimal(7,2) ,
cs_net_paid decimal(7,2) ,
cs_net_paid_inc_tax decimal(7,2) ,
cs_net_paid_inc_ship decimal(7,2) ,
cs_net_paid_inc_ship_tax decimal(7,2) ,
cs_net_profit decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/catalog_sales';
DROP TABLE IF EXISTS ext_store_sales;
create external table ext_store_sales
(
ss_sold_date_sk integer ,
ss_sold_time_sk integer ,
ss_item_sk integer ,
ss_customer_sk integer ,
ss_cdemo_sk integer ,
ss_hdemo_sk integer ,
ss_addr_sk integer ,
ss_store_sk integer ,
ss_promo_sk integer ,
ss_ticket_number integer ,
ss_quantity integer ,
ss_wholesale_cost decimal(7,2) ,
ss_list_price decimal(7,2) ,
ss_sales_price decimal(7,2) ,
ss_ext_discount_amt decimal(7,2) ,
ss_ext_sales_price decimal(7,2) ,
ss_ext_wholesale_cost decimal(7,2) ,
ss_ext_list_price decimal(7,2) ,
ss_ext_tax decimal(7,2) ,
ss_coupon_amt decimal(7,2) ,
ss_net_paid decimal(7,2) ,
ss_net_paid_inc_tax decimal(7,2) ,
ss_net_profit decimal(7,2)
)row format delimited fields terminated by '|'
location '/tmp/test/store_sales';
2.以下为改造后兼容impala语法的kudu表DDL(暂时默认每个表都是3个分区,分区键为主键)
use tpc_test;
DROP TABLE IF EXISTS dbgen_version;
create table dbgen_version
(
dv_version STRING ,
dv_create_date TIMESTAMP ,
dv_create_time STRING ,
dv_cmdline_args STRING ,
PRIMARY KEY(dv_version)
)
PARTITION BY HASH(dv_version) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS customer_address;
create table customer_address
(
ca_address_sk integer ,
ca_address_id STRING ,
ca_street_number STRING ,
ca_street_name STRING ,
ca_street_type STRING ,
ca_suite_number STRING ,
ca_city STRING ,
ca_county STRING ,
ca_state STRING ,
ca_zip STRING ,
ca_country STRING ,
ca_gmt_offset decimal(5,2) ,
ca_location_type STRING ,
primary key (ca_address_sk)
)
PARTITION BY HASH(ca_address_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS customer_demographics;
create table customer_demographics
(
cd_demo_sk integer ,
cd_gender STRING ,
cd_marital_status STRING ,
cd_education_status STRING ,
cd_purchase_estimate integer ,
cd_credit_rating STRING ,
cd_dep_count integer ,
cd_dep_employed_count integer ,
cd_dep_college_count integer ,
primary key (cd_demo_sk)
)
PARTITION BY HASH(cd_demo_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS date_dim;
create table date_dim
(
d_date_sk integer ,
d_date_id STRING ,
d_date TIMESTAMP ,
d_month_seq integer ,
d_week_seq integer ,
d_quarter_seq integer ,
d_year integer ,
d_dow integer ,
d_moy integer ,
d_dom integer ,
d_qoy integer ,
d_fy_year integer ,
d_fy_quarter_seq integer ,
d_fy_week_seq integer ,
d_day_name STRING ,
d_quarter_name STRING ,
d_holiday STRING ,
d_weekend STRING ,
d_following_holiday STRING ,
d_first_dom integer ,
d_last_dom integer ,
d_same_day_ly integer ,
d_same_day_lq integer ,
d_current_day STRING ,
d_current_week STRING ,
d_current_month STRING ,
d_current_quarter STRING ,
d_current_year STRING ,
primary key (d_date_sk)
)
PARTITION BY HASH(d_date_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS warehouse;
create table warehouse
(
w_warehouse_sk integer ,
w_warehouse_id STRING ,
w_warehouse_name STRING ,
w_warehouse_sq_ft integer ,
w_street_number STRING ,
w_street_name STRING ,
w_street_type STRING ,
w_suite_number STRING ,
w_city STRING ,
w_county STRING ,
w_state STRING ,
w_zip STRING ,
w_country STRING ,
w_gmt_offset decimal(5,2) ,
primary key (w_warehouse_sk)
)
PARTITION BY HASH(w_warehouse_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS ship_mode;
create table ship_mode
(
sm_ship_mode_sk integer ,
sm_ship_mode_id STRING ,
sm_type STRING ,
sm_code STRING ,
sm_carrier STRING ,
sm_contract STRING ,
primary key (sm_ship_mode_sk)
)
PARTITION BY HASH(sm_ship_mode_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS time_dim;
create table time_dim
(
t_time_sk integer ,
t_time_id STRING ,
t_time integer ,
t_hour integer ,
t_minute integer ,
t_second integer ,
t_am_pm STRING ,
t_shift STRING ,
t_sub_shift STRING ,
t_meal_time STRING ,
primary key (t_time_sk)
)
PARTITION BY HASH(t_time_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS reason;
create table reason
(
r_reason_sk integer ,
r_reason_id STRING ,
r_reason_desc STRING ,
primary key (r_reason_sk)
)
PARTITION BY HASH(r_reason_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS income_band;
create table income_band
(
ib_income_band_sk integer ,
ib_lower_bound integer ,
ib_upper_bound integer ,
primary key (ib_income_band_sk)
)
PARTITION BY HASH(ib_income_band_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS item;
create table item
(
i_item_sk integer ,
i_item_id STRING ,
i_rec_start_date TIMESTAMP ,
i_rec_end_date TIMESTAMP ,
i_item_desc STRING ,
i_current_price decimal(7,2) ,
i_wholesale_cost decimal(7,2) ,
i_brand_id integer ,
i_brand STRING ,
i_class_id integer ,
i_class STRING ,
i_category_id integer ,
i_category STRING ,
i_manufact_id integer ,
i_manufact STRING ,
i_size STRING ,
i_formulation STRING ,
i_color STRING ,
i_units STRING ,
i_container STRING ,
i_manager_id integer ,
i_product_name STRING ,
primary key (i_item_sk)
)
PARTITION BY HASH(i_item_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS store;
create table store
(
s_store_sk integer ,
s_store_id STRING ,
s_rec_start_date TIMESTAMP ,
s_rec_end_date TIMESTAMP ,
s_closed_date_sk integer ,
s_store_name STRING ,
s_number_employees integer ,
s_floor_space integer ,
s_hours STRING ,
s_manager STRING ,
s_market_id integer ,
s_geography_class STRING ,
s_market_desc STRING ,
s_market_manager STRING ,
s_division_id integer ,
s_division_name STRING ,
s_company_id integer ,
s_company_name STRING ,
s_street_number STRING ,
s_street_name STRING ,
s_street_type STRING ,
s_suite_number STRING ,
s_city STRING ,
s_county STRING ,
s_state STRING ,
s_zip STRING ,
s_country STRING ,
s_gmt_offset decimal(5,2) ,
s_tax_precentage decimal(5,2) ,
primary key (s_store_sk)
)
PARTITION BY HASH(s_store_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS call_center;
create table call_center
(
cc_call_center_sk integer ,
cc_call_center_id STRING ,
cc_rec_start_date TIMESTAMP ,
cc_rec_end_date TIMESTAMP ,
cc_closed_date_sk integer ,
cc_open_date_sk integer ,
cc_name STRING ,
cc_class STRING ,
cc_employees integer ,
cc_sq_ft integer ,
cc_hours STRING ,
cc_manager STRING ,
cc_mkt_id integer ,
cc_mkt_class STRING ,
cc_mkt_desc STRING ,
cc_market_manager STRING ,
cc_division integer ,
cc_division_name STRING ,
cc_company integer ,
cc_company_name STRING ,
cc_street_number STRING ,
cc_street_name STRING ,
cc_street_type STRING ,
cc_suite_number STRING ,
cc_city STRING ,
cc_county STRING ,
cc_state STRING ,
cc_zip STRING ,
cc_country STRING ,
cc_gmt_offset decimal(5,2) ,
cc_tax_percentage decimal(5,2) ,
primary key (cc_call_center_sk)
)
PARTITION BY HASH(cc_call_center_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS customer;
create table customer
(
c_customer_sk integer ,
c_customer_id STRING ,
c_current_cdemo_sk integer ,
c_current_hdemo_sk integer ,
c_current_addr_sk integer ,
c_first_shipto_date_sk integer ,
c_first_sales_date_sk integer ,
c_salutation STRING ,
c_first_name STRING ,
c_last_name STRING ,
c_preferred_cust_flag STRING ,
c_birth_day integer ,
c_birth_month integer ,
c_birth_year integer ,
c_birth_country STRING ,
c_login STRING ,
c_email_address STRING ,
c_last_review_date STRING ,
primary key (c_customer_sk)
)
PARTITION BY HASH(c_customer_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS web_site;
create table web_site
(
web_site_sk integer ,
web_site_id STRING ,
web_rec_start_date TIMESTAMP ,
web_rec_end_date TIMESTAMP ,
web_name STRING ,
web_open_date_sk integer ,
web_close_date_sk integer ,
web_class STRING ,
web_manager STRING ,
web_mkt_id integer ,
web_mkt_class STRING ,
web_mkt_desc STRING ,
web_market_manager STRING ,
web_company_id integer ,
web_company_name STRING ,
web_street_number STRING ,
web_street_name STRING ,
web_street_type STRING ,
web_suite_number STRING ,
web_city STRING ,
web_county STRING ,
web_state STRING ,
web_zip STRING ,
web_country STRING ,
web_gmt_offset decimal(5,2) ,
web_tax_percentage decimal(5,2) ,
primary key (web_site_sk)
)
PARTITION BY HASH(web_site_sk) PARTITIONS 3
STORED AS KUDU
;
--ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::tpc_test.store_returns'
--CAUSED BY: ImpalaRuntimeException: Kudu PRIMARY KEY columns must be specified as the first columns in the table (expected leading columns ('sr_item_sk', 'sr_ticket_number') but found ('sr_item_sk', 'sr_returned_date_sk'))
-- 字段顺序有要求
DROP TABLE IF EXISTS store_returns;
create table store_returns
(
sr_item_sk integer ,
sr_ticket_number integer ,
sr_returned_date_sk integer ,
sr_return_time_sk integer ,
sr_customer_sk integer ,
sr_cdemo_sk integer ,
sr_hdemo_sk integer ,
sr_addr_sk integer ,
sr_store_sk integer ,
sr_reason_sk integer ,
sr_return_quantity integer ,
sr_return_amt decimal(7,2) ,
sr_return_tax decimal(7,2) ,
sr_return_amt_inc_tax decimal(7,2) ,
sr_fee decimal(7,2) ,
sr_return_ship_cost decimal(7,2) ,
sr_refunded_cash decimal(7,2) ,
sr_reversed_charge decimal(7,2) ,
sr_store_credit decimal(7,2) ,
sr_net_loss decimal(7,2) ,
primary key (sr_item_sk, sr_ticket_number)
)
PARTITION BY HASH(sr_item_sk,sr_ticket_number) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS household_demographics;
create table household_demographics
(
hd_demo_sk integer ,
hd_income_band_sk integer ,
hd_buy_potential STRING ,
hd_dep_count integer ,
hd_vehicle_count integer ,
primary key (hd_demo_sk)
)
PARTITION BY HASH(hd_demo_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS web_page;
create table web_page
(
wp_web_page_sk integer ,
wp_web_page_id STRING ,
wp_rec_start_date TIMESTAMP ,
wp_rec_end_date TIMESTAMP ,
wp_creation_date_sk integer ,
wp_access_date_sk integer ,
wp_autogen_flag STRING ,
wp_customer_sk integer ,
wp_url STRING ,
wp_type STRING ,
wp_char_count integer ,
wp_link_count integer ,
wp_image_count integer ,
wp_max_ad_count integer ,
primary key (wp_web_page_sk)
)
PARTITION BY HASH(wp_web_page_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS promotion;
create table promotion
(
p_promo_sk integer ,
p_promo_id STRING ,
p_start_date_sk integer ,
p_end_date_sk integer ,
p_item_sk integer ,
p_cost decimal(15,2) ,
p_response_target integer ,
p_promo_name STRING ,
p_channel_dmail STRING ,
p_channel_email STRING ,
p_channel_catalog STRING ,
p_channel_tv STRING ,
p_channel_radio STRING ,
p_channel_press STRING ,
p_channel_event STRING ,
p_channel_demo STRING ,
p_channel_details STRING ,
p_purpose STRING ,
p_discount_active STRING ,
primary key (p_promo_sk)
)
PARTITION BY HASH(p_promo_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS catalog_page;
create table catalog_page
(
cp_catalog_page_sk integer ,
cp_catalog_page_id STRING ,
cp_start_date_sk integer ,
cp_end_date_sk integer ,
cp_department STRING ,
cp_catalog_number integer ,
cp_catalog_page_number integer ,
cp_description STRING ,
cp_type STRING ,
primary key (cp_catalog_page_sk)
)
PARTITION BY HASH(cp_catalog_page_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS inventory;
create table inventory
(
inv_date_sk integer ,
inv_item_sk integer ,
inv_warehouse_sk integer ,
inv_quantity_on_hand integer ,
primary key (inv_date_sk, inv_item_sk, inv_warehouse_sk)
)
PARTITION BY HASH(inv_date_sk, inv_item_sk, inv_warehouse_sk) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS catalog_returns;
create table catalog_returns
(
cr_item_sk integer ,
cr_order_number integer ,
cr_returned_date_sk integer ,
cr_returned_time_sk integer ,
cr_refunded_customer_sk integer ,
cr_refunded_cdemo_sk integer ,
cr_refunded_hdemo_sk integer ,
cr_refunded_addr_sk integer ,
cr_returning_customer_sk integer ,
cr_returning_cdemo_sk integer ,
cr_returning_hdemo_sk integer ,
cr_returning_addr_sk integer ,
cr_call_center_sk integer ,
cr_catalog_page_sk integer ,
cr_ship_mode_sk integer ,
cr_warehouse_sk integer ,
cr_reason_sk integer ,
cr_return_quantity integer ,
cr_return_amount decimal(7,2) ,
cr_return_tax decimal(7,2) ,
cr_return_amt_inc_tax decimal(7,2) ,
cr_fee decimal(7,2) ,
cr_return_ship_cost decimal(7,2) ,
cr_refunded_cash decimal(7,2) ,
cr_reversed_charge decimal(7,2) ,
cr_store_credit decimal(7,2) ,
cr_net_loss decimal(7,2) ,
primary key (cr_item_sk, cr_order_number)
)
PARTITION BY HASH(cr_item_sk, cr_order_number) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS web_returns;
create table web_returns
(
wr_item_sk integer ,
wr_order_number integer ,
wr_returned_date_sk integer ,
wr_returned_time_sk integer ,
wr_refunded_customer_sk integer ,
wr_refunded_cdemo_sk integer ,
wr_refunded_hdemo_sk integer ,
wr_refunded_addr_sk integer ,
wr_returning_customer_sk integer ,
wr_returning_cdemo_sk integer ,
wr_returning_hdemo_sk integer ,
wr_returning_addr_sk integer ,
wr_web_page_sk integer ,
wr_reason_sk integer ,
wr_return_quantity integer ,
wr_return_amt decimal(7,2) ,
wr_return_tax decimal(7,2) ,
wr_return_amt_inc_tax decimal(7,2) ,
wr_fee decimal(7,2) ,
wr_return_ship_cost decimal(7,2) ,
wr_refunded_cash decimal(7,2) ,
wr_reversed_charge decimal(7,2) ,
wr_account_credit decimal(7,2) ,
wr_net_loss decimal(7,2) ,
primary key (wr_item_sk, wr_order_number)
)
PARTITION BY HASH(wr_item_sk, wr_order_number) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS web_sales;
create table web_sales
(
ws_item_sk integer ,
ws_order_number integer ,
ws_sold_date_sk integer ,
ws_sold_time_sk integer ,
ws_ship_date_sk integer ,
ws_bill_customer_sk integer ,
ws_bill_cdemo_sk integer ,
ws_bill_hdemo_sk integer ,
ws_bill_addr_sk integer ,
ws_ship_customer_sk integer ,
ws_ship_cdemo_sk integer ,
ws_ship_hdemo_sk integer ,
ws_ship_addr_sk integer ,
ws_web_page_sk integer ,
ws_web_site_sk integer ,
ws_ship_mode_sk integer ,
ws_warehouse_sk integer ,
ws_promo_sk integer ,
ws_quantity integer ,
ws_wholesale_cost decimal(7,2) ,
ws_list_price decimal(7,2) ,
ws_sales_price decimal(7,2) ,
ws_ext_discount_amt decimal(7,2) ,
ws_ext_sales_price decimal(7,2) ,
ws_ext_wholesale_cost decimal(7,2) ,
ws_ext_list_price decimal(7,2) ,
ws_ext_tax decimal(7,2) ,
ws_coupon_amt decimal(7,2) ,
ws_ext_ship_cost decimal(7,2) ,
ws_net_paid decimal(7,2) ,
ws_net_paid_inc_tax decimal(7,2) ,
ws_net_paid_inc_ship decimal(7,2) ,
ws_net_paid_inc_ship_tax decimal(7,2) ,
ws_net_profit decimal(7,2) ,
primary key (ws_item_sk, ws_order_number)
)
PARTITION BY HASH(ws_item_sk, ws_order_number) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS catalog_sales;
create table catalog_sales
(
cs_item_sk integer ,
cs_order_number integer ,
cs_sold_date_sk integer ,
cs_sold_time_sk integer ,
cs_ship_date_sk integer ,
cs_bill_customer_sk integer ,
cs_bill_cdemo_sk integer ,
cs_bill_hdemo_sk integer ,
cs_bill_addr_sk integer ,
cs_ship_customer_sk integer ,
cs_ship_cdemo_sk integer ,
cs_ship_hdemo_sk integer ,
cs_ship_addr_sk integer ,
cs_call_center_sk integer ,
cs_catalog_page_sk integer ,
cs_ship_mode_sk integer ,
cs_warehouse_sk integer ,
cs_promo_sk integer ,
cs_quantity integer ,
cs_wholesale_cost decimal(7,2) ,
cs_list_price decimal(7,2) ,
cs_sales_price decimal(7,2) ,
cs_ext_discount_amt decimal(7,2) ,
cs_ext_sales_price decimal(7,2) ,
cs_ext_wholesale_cost decimal(7,2) ,
cs_ext_list_price decimal(7,2) ,
cs_ext_tax decimal(7,2) ,
cs_coupon_amt decimal(7,2) ,
cs_ext_ship_cost decimal(7,2) ,
cs_net_paid decimal(7,2) ,
cs_net_paid_inc_tax decimal(7,2) ,
cs_net_paid_inc_ship decimal(7,2) ,
cs_net_paid_inc_ship_tax decimal(7,2) ,
cs_net_profit decimal(7,2) ,
primary key (cs_item_sk, cs_order_number)
)
PARTITION BY HASH(cs_item_sk, cs_order_number) PARTITIONS 3
STORED AS KUDU
;
DROP TABLE IF EXISTS store_sales;
create table store_sales
(
ss_item_sk integer ,
ss_ticket_number integer ,
ss_sold_date_sk integer ,
ss_sold_time_sk integer ,
ss_customer_sk integer ,
ss_cdemo_sk integer ,
ss_hdemo_sk integer ,
ss_addr_sk integer ,
ss_store_sk integer ,
ss_promo_sk integer ,
ss_quantity integer ,
ss_wholesale_cost decimal(7,2) ,
ss_list_price decimal(7,2) ,
ss_sales_price decimal(7,2) ,
ss_ext_discount_amt decimal(7,2) ,
ss_ext_sales_price decimal(7,2) ,
ss_ext_wholesale_cost decimal(7,2) ,
ss_ext_list_price decimal(7,2) ,
ss_ext_tax decimal(7,2) ,
ss_coupon_amt decimal(7,2) ,
ss_net_paid decimal(7,2) ,
ss_net_paid_inc_tax decimal(7,2) ,
ss_net_profit decimal(7,2) ,
primary key (ss_item_sk, ss_ticket_number)
)
PARTITION BY HASH(ss_item_sk, ss_ticket_number) PARTITIONS 3
STORED AS KUDU
;
impala外部表和kudu表的创建-外部表数据导入到kudu表中
1.进入impala-shell窗口,先新建tpc_test数据库:create database tpc_test;
➜ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to test2:21000
Server version: impala version 2.8.0-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d658)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.8.0-cdh5.11.2 (f89269c) built on Fri Aug 18 14:04:44 PDT 2017)
The HISTORY command lists all shell commands in chronological order.
***********************************************************************************
[test2:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default | Default Hive database |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.01s
[test4:21000] default> create database tpc_test;
Query: create database tpc_test
+----------------------------+
| summary |
+----------------------------+
| Database has been created. |
+----------------------------+
Fetched 1 row(s) in 0.40s
[test4:21000] default>
2.新建外部表ddl文件
vim /root/tpc_ds_test/sql/ext/ext_ddl.sql
将上面👆的外部表DDL复制进去
3.新建kudu表ddl文件
vim /root/tpc_ds_test/sql/kudu/kudu_ddl.sql
将上面👆的kudu表DDL复制进去
4.新建简易COUNT查询SQL
vim /root/tpc_ds_test/sql/ext/ext_count_query.sql
将下面👇的SQL复制进去
use tpc_test;
SELECT COUNT(1) FROM ext_call_center ;
SELECT COUNT(1) FROM ext_catalog_page ;
SELECT COUNT(1) FROM ext_catalog_returns ;
SELECT COUNT(1) FROM ext_catalog_sales ;
SELECT COUNT(1) FROM ext_customer ;
SELECT COUNT(1) FROM ext_customer_address ;
SELECT COUNT(1) FROM ext_customer_demographics ;
SELECT COUNT(1) FROM ext_date_dim ;
SELECT COUNT(1) FROM ext_dbgen_version ;
SELECT COUNT(1) FROM ext_household_demographics;
SELECT COUNT(1) FROM ext_income_band ;
SELECT COUNT(1) FROM ext_inventory ;
SELECT COUNT(1) FROM ext_item ;
SELECT COUNT(1) FROM ext_promotion ;
SELECT COUNT(1) FROM ext_reason ;
SELECT COUNT(1) FROM ext_ship_mode ;
SELECT COUNT(1) FROM ext_store ;
SELECT COUNT(1) FROM ext_store_returns ;
SELECT COUNT(1) FROM ext_store_sales ;
SELECT COUNT(1) FROM ext_time_dim ;
SELECT COUNT(1) FROM ext_warehouse ;
SELECT COUNT(1) FROM ext_web_page ;
SELECT COUNT(1) FROM ext_web_returns ;
SELECT COUNT(1) FROM ext_web_sales ;
SELECT COUNT(1) FROM ext_web_site ;
5.新建简易limit查询SQL,主要验证外部表数据是否能够被正常查询
vim /root/tpc_ds_test/sql/ext/ext_limit_query.sql
将下面👇的SQL复制进去
use tpc_test;
SELECT * FROM ext_call_center limit 2;
SELECT * FROM ext_catalog_page limit 2;
SELECT * FROM ext_catalog_returns limit 2;
SELECT * FROM ext_catalog_sales limit 2;
SELECT * FROM ext_customer limit 2;
SELECT * FROM ext_customer_address limit 2;
SELECT * FROM ext_customer_demographics limit 2;
SELECT * FROM ext_date_dim limit 2;
SELECT * FROM ext_dbgen_version limit 2;
SELECT * FROM ext_household_demographics limit 2;
SELECT * FROM ext_income_band limit 2;
SELECT * FROM ext_inventory limit 2;
SELECT * FROM ext_item limit 2;
SELECT * FROM ext_promotion limit 2;
SELECT * FROM ext_reason limit 2;
SELECT * FROM ext_ship_mode limit 2;
SELECT * FROM ext_store limit 2;
SELECT * FROM ext_store_returns limit 2;
SELECT * FROM ext_store_sales limit 2;
SELECT * FROM ext_time_dim limit 2;
SELECT * FROM ext_warehouse limit 2;
SELECT * FROM ext_web_page limit 2;
SELECT * FROM ext_web_returns limit 2;
SELECT * FROM ext_web_sales limit 2;
SELECT * FROM ext_web_site limit 2;
6.准备导数SQL
vim /root/tpc_ds_test/sql/import/import_test_data.sql
将下面👇的SQL复制进去
use tpc_test;
INSERT INTO call_center SELECT * FROM ext_call_center ;
INSERT INTO catalog_page SELECT * FROM ext_catalog_page ;
INSERT INTO catalog_returns SELECT cr_item_sk , cr_order_number , cr_returned_date_sk , cr_returned_time_sk , cr_refunded_customer_sk , cr_refunded_cdemo_sk , cr_refunded_hdemo_sk , cr_refunded_addr_sk , cr_returning_customer_sk , cr_returning_cdemo_sk , cr_returning_hdemo_sk , cr_returning_addr_sk , cr_call_center_sk , cr_catalog_page_sk , cr_ship_mode_sk , cr_warehouse_sk , cr_reason_sk , cr_return_quantity , cr_return_amount , cr_return_tax , cr_return_amt_inc_tax , cr_fee , cr_return_ship_cost , cr_refunded_cash , cr_reversed_charge , cr_store_credit , cr_net_loss FROM ext_catalog_returns;
INSERT INTO catalog_sales SELECT cs_item_sk , cs_order_number , cs_sold_date_sk , cs_sold_time_sk , cs_ship_date_sk , cs_bill_customer_sk , cs_bill_cdemo_sk , cs_bill_hdemo_sk , cs_bill_addr_sk , cs_ship_customer_sk , cs_ship_cdemo_sk , cs_ship_hdemo_sk , cs_ship_addr_sk , cs_call_center_sk , cs_catalog_page_sk , cs_ship_mode_sk , cs_warehouse_sk , cs_promo_sk , cs_quantity , cs_wholesale_cost , cs_list_price , cs_sales_price , cs_ext_discount_amt , cs_ext_sales_price , cs_ext_wholesale_cost , cs_ext_list_price , cs_ext_tax , cs_coupon_amt , cs_ext_ship_cost , cs_net_paid , cs_net_paid_inc_tax , cs_net_paid_inc_ship , cs_net_paid_inc_ship_tax , cs_net_profit FROM ext_catalog_sales;
INSERT INTO customer SELECT * FROM ext_customer ;
INSERT INTO customer_address SELECT * FROM ext_customer_address ;
INSERT INTO customer_demographics SELECT * FROM ext_customer_demographics ;
INSERT INTO date_dim SELECT * FROM ext_date_dim ;
INSERT INTO dbgen_version SELECT * FROM ext_dbgen_version ;
INSERT INTO household_demographics SELECT * FROM ext_household_demographics;
INSERT INTO income_band SELECT * FROM ext_income_band ;
INSERT INTO inventory SELECT * FROM ext_inventory ;
INSERT INTO item SELECT * FROM ext_item ;
INSERT INTO promotion SELECT * FROM ext_promotion ;
INSERT INTO reason SELECT * FROM ext_reason ;
INSERT INTO ship_mode SELECT * FROM ext_ship_mode ;
INSERT INTO store SELECT * FROM ext_store ;
INSERT INTO store_returns SELECT sr_item_sk , sr_ticket_number , sr_returned_date_sk , sr_return_time_sk , sr_customer_sk , sr_cdemo_sk , sr_hdemo_sk , sr_addr_sk , sr_store_sk , sr_reason_sk , sr_return_quantity , sr_return_amt , sr_return_tax , sr_return_amt_inc_tax , sr_fee , sr_return_ship_cost , sr_refunded_cash , sr_reversed_charge , sr_store_credit , sr_net_loss FROM ext_store_returns;
INSERT INTO store_sales SELECT ss_item_sk , ss_ticket_number , ss_sold_date_sk , ss_sold_time_sk , ss_customer_sk , ss_cdemo_sk , ss_hdemo_sk , ss_addr_sk , ss_store_sk , ss_promo_sk , ss_quantity , ss_wholesale_cost , ss_list_price , ss_sales_price , ss_ext_discount_amt , ss_ext_sales_price , ss_ext_wholesale_cost , ss_ext_list_price , ss_ext_tax , ss_coupon_amt , ss_net_paid , ss_net_paid_inc_tax , ss_net_profit FROM ext_store_sales;
INSERT INTO time_dim SELECT * FROM ext_time_dim ;
INSERT INTO warehouse SELECT * FROM ext_warehouse ;
INSERT INTO web_page SELECT * FROM ext_web_page ;
INSERT INTO web_returns SELECT wr_item_sk , wr_order_number , wr_returned_date_sk , wr_returned_time_sk , wr_refunded_customer_sk , wr_refunded_cdemo_sk , wr_refunded_hdemo_sk , wr_refunded_addr_sk , wr_returning_customer_sk , wr_returning_cdemo_sk , wr_returning_hdemo_sk , wr_returning_addr_sk , wr_web_page_sk , wr_reason_sk , wr_return_quantity , wr_return_amt , wr_return_tax , wr_return_amt_inc_tax , wr_fee , wr_return_ship_cost , wr_refunded_cash , wr_reversed_charge , wr_account_credit , wr_net_loss FROM ext_web_returns;
INSERT INTO web_sales SELECT ws_item_sk , ws_order_number , ws_sold_date_sk , ws_sold_time_sk , ws_ship_date_sk , ws_bill_customer_sk , ws_bill_cdemo_sk , ws_bill_hdemo_sk , ws_bill_addr_sk , ws_ship_customer_sk , ws_ship_cdemo_sk , ws_ship_hdemo_sk , ws_ship_addr_sk , ws_web_page_sk , ws_web_site_sk , ws_ship_mode_sk , ws_warehouse_sk , ws_promo_sk , ws_quantity , ws_wholesale_cost , ws_list_price , ws_sales_price , ws_ext_discount_amt , ws_ext_sales_price , ws_ext_wholesale_cost , ws_ext_list_price , ws_ext_tax , ws_coupon_amt , ws_ext_ship_cost , ws_net_paid , ws_net_paid_inc_tax , ws_net_paid_inc_ship , ws_net_paid_inc_ship_tax , ws_net_profit FROM ext_web_sales;
INSERT INTO web_site SELECT * FROM ext_web_site ;
7.创建kudu_analyze.sql,用于统计分析Impala|kudu的表
vim /root/tpc_ds_test/sql/kudu/kudu_analyze.sql
use tpc_test;
compute stats call_center ;
compute stats catalog_page ;
compute stats catalog_returns ;
compute stats catalog_sales ;
compute stats customer_address ;
compute stats customer_demographics ;
compute stats customer ;
compute stats date_dim ;
compute stats household_demographics ;
compute stats income_band ;
compute stats inventory ;
compute stats item ;
compute stats promotion ;
compute stats reason ;
compute stats ship_mode ;
compute stats store_returns ;
compute stats store_sales ;
compute stats store ;
compute stats time_dim ;
compute stats warehouse ;
compute stats web_page ;
compute stats web_returns ;
compute stats web_sales ;
compute stats web_site ;
8.新建简易COUNT查询SQL
vim /root/tpc_ds_test/sql/kudu/kudu_count_query.sql
将下面👇的SQL复制进去
use tpc_test;
SELECT COUNT(1) FROM call_center ;
SELECT COUNT(1) FROM catalog_page ;
SELECT COUNT(1) FROM catalog_returns ;
SELECT COUNT(1) FROM catalog_sales ;
SELECT COUNT(1) FROM customer ;
SELECT COUNT(1) FROM customer_address ;
SELECT COUNT(1) FROM customer_demographics ;
SELECT COUNT(1) FROM date_dim ;
SELECT COUNT(1) FROM dbgen_version ;
SELECT COUNT(1) FROM household_demographics;
SELECT COUNT(1) FROM income_band ;
SELECT COUNT(1) FROM inventory ;
SELECT COUNT(1) FROM item ;
SELECT COUNT(1) FROM promotion ;
SELECT COUNT(1) FROM reason ;
SELECT COUNT(1) FROM ship_mode ;
SELECT COUNT(1) FROM store ;
SELECT COUNT(1) FROM store_returns ;
SELECT COUNT(1) FROM store_sales ;
SELECT COUNT(1) FROM time_dim ;
SELECT COUNT(1) FROM warehouse ;
SELECT COUNT(1) FROM web_page ;
SELECT COUNT(1) FROM web_returns ;
SELECT COUNT(1) FROM web_sales ;
SELECT COUNT(1) FROM web_site ;
9.新建简易limit查询SQL,主要验证内部表数据是否能够被正常查询
vim /root/tpc_ds_test/sql/kudu/kudu_limit_query.sql
将下面👇的SQL复制进去
use tpc_test;
SELECT * FROM call_center limit 2;
SELECT * FROM catalog_page limit 2;
SELECT * FROM catalog_returns limit 2;
SELECT * FROM catalog_sales limit 2;
SELECT * FROM customer limit 2;
SELECT * FROM customer_address limit 2;
SELECT * FROM customer_demographics limit 2;
SELECT * FROM date_dim limit 2;
SELECT * FROM dbgen_version limit 2;
SELECT * FROM household_demographics limit 2;
SELECT * FROM income_band limit 2;
SELECT * FROM inventory limit 2;
SELECT * FROM item limit 2;
SELECT * FROM promotion limit 2;
SELECT * FROM reason limit 2;
SELECT * FROM ship_mode limit 2;
SELECT * FROM store limit 2;
SELECT * FROM store_returns limit 2;
SELECT * FROM store_sales limit 2;
SELECT * FROM time_dim limit 2;
SELECT * FROM warehouse limit 2;
SELECT * FROM web_page limit 2;
SELECT * FROM web_returns limit 2;
SELECT * FROM web_sales limit 2;
SELECT * FROM web_site limit 2;
10.准备start_tpc_import.sh导数脚本。
start_tpc_import.sh在初始化目录时已经创建完成
cd /root/tpc_ds_test/bin && vim start_tpc_import.sh
脚本执行内容:
1.新建impala外部表ddl。
2.执行impala外部表COUNT和limit查询。
3.执行kudu内部表ddl。
4.将impala外部表数据导入到kudu内部表。
5.统计分析kudu的内部表,执行compute stats table_name。
5.执行kudu内部表COUNT和limit查询。
6.收集执行结果信息和耗时,tpc_ds_test/result目录查看结果,tpc_ds_test/logs目录查看执行过程日志。
将下面👇的脚本复制进去
#!/bin/bash
basedir="/root/tpc_ds_test/"
ext_dir="${basedir}sql/ext/"
kudu_dir="${basedir}sql/kudu/"
import_dir="${basedir}sql/import/"
ext_log_dir="${basedir}logs/ext/"
kudu_log_dir="${basedir}logs/kudu/"
import_log_dir="${basedir}logs/import/"
rm -rf "${ext_log_dir}"*
rm -rf "${kudu_log_dir}"*
rm -rf "${import_log_dir}"*
echo "impala-shell -f ${ext_dir}ext_ddl.sql..."
impala-shell -f ${ext_dir}ext_ddl.sql > ${ext_log_dir}ext_ddl.log 2>&1
echo "impala-shell -f ${ext_dir}ext_count_query.sql -B ..."
impala-shell -f ${ext_dir}ext_count_query.sql -B > ${ext_log_dir}ext_count_query.log 2>&1
echo "impala-shell -f ${ext_dir}ext_limit_query.sql..."
impala-shell -f ${ext_dir}ext_limit_query.sql > ${ext_log_dir}ext_limit_query.log 2>&1
echo "impala-shell -f ${kudu_dir}kudu_ddl.sql..."
impala-shell -f ${kudu_dir}kudu_ddl.sql > ${kudu_log_dir}kudu_ddl.log 2>&1
echo "impala-shell -f ${import_dir}import_test_data.sql..."
impala-shell -f ${import_dir}import_test_data.sql > ${import_log_dir}import_test_data.log 2>&1
echo "impala-shell -f ${kudu_dir}kudu_analyze.sql -B ..."
impala-shell -f ${kudu_dir}kudu_analyze.sql -B > ${kudu_log_dir}kudu_analyze.log 2>&1
echo "impala-shell -f ${kudu_dir}kudu_count_query.sql -B ..."
impala-shell -f ${kudu_dir}kudu_count_query.sql -B > ${kudu_log_dir}kudu_count_query.log 2>&1
echo "impala-shell -f ${kudu_dir}kudu_limit_query.sql..."
impala-shell -f ${kudu_dir}kudu_limit_query.sql > ${kudu_log_dir}kudu_limit_query.log 2>&1
start_str="Query:"
start_flag=""
step="1"
condition=""
content=""
ext_count_result_file="${basedir}result/ext_count_result.csv"
rm -rf $ext_count_result_file
echo "generate ${basedir}result/ext_count_result.csv..."
echo "SQL语句,数据量,执行时长" >> $ext_count_result_file
while read -r line || [[ -n ${line} ]]
do
#echo "$line"
if [[ "${start_flag}" == "" ]];
then
condition=$(echo $line | grep -i "${start_str}")
#echo "condition ============== $condition"
if [[ "$condition" != "" ]] && [[ "$condition" == *"SELECT"* ]];
then
start_flag="1"
step=$[$step+1]
content=${line}
fi
else
if [ $step -eq 4 ]
then
content=${content},${line}
step=$[$step+1]
elif [ $step -eq 5 ]
then
content=${content},${line}
#echo "content===$content"
echo "${content}" >> $ext_count_result_file
start_flag=""
content=""
step=1
else
step=$[$step+1]
fi
fi
done < ${ext_log_dir}ext_count_query.log
kudu_count_result_file="${basedir}result/kudu_count_result.csv"
rm -rf $kudu_count_result_file
echo "generate ${basedir}result/kudu_count_result.csv..."
echo "SQL语句,数据量,执行时长" >> $kudu_count_result_file
while read -r line || [[ -n ${line} ]]
do
#echo "$line"
if [[ "${start_flag}" == "" ]];
then
condition=$(echo $line | grep -i "${start_str}")
#echo "condition ============== $condition"
if [[ "$condition" != "" ]] && [[ "$condition" == *"SELECT"* ]];
then
start_flag="1"
step=$[$step+1]
content=${line}
fi
else
if [ $step -eq 4 ]
then
content=${content},${line}
step=$[$step+1]
elif [ $step -eq 5 ]
then
content=${content},${line}
# echo "content===$content"
echo "${content}" >> $kudu_count_result_file
start_flag=""
content=""
step=1
else
step=$[$step+1]
fi
fi
done < ${kudu_log_dir}kudu_count_query.log
import_test_data_result_file="${basedir}result/import_test_data_result.csv"
rm -rf $import_test_data_result_file
echo "generate ${basedir}result/import_test_data_result.csv..."
echo "SQL语句,执行时间,执行时长" >> $import_test_data_result_file
while read -r line || [[ -n ${line} ]]
do
#echo "$line"
if [[ "${start_flag}" == "" ]];
then
condition=$(echo $line | grep -i "${start_str}")
#echo "condition ============== $condition"
if [[ "$condition" != "" ]] && [[ "$condition" == *"SELECT"* ]];
then
start_flag="1"
step=$[$step+1]
sql=${line//,/,}
content=${sql}
fi
else
if [ $step -eq 2 ]
then
content=${content},${line}
step=$[$step+1]
elif [ $step -eq 4 ]
then
exec_time=${line//,/,}
content=${content},${exec_time}
#echo "content===$content"
echo "${content}" >> $import_test_data_result_file
start_flag=""
content=""
step=1
else
step=$[$step+1]
fi
fi
done < ${import_log_dir}import_test_data.log
kudu_analyze_result_file="${basedir}result/kudu_analyze_result.csv"
rm -rf $kudu_analyze_result_file
echo "generate ${basedir}result/kudu_analyze_result.csv..."
echo "SQL语句,更新内容,执行时长" >> $kudu_analyze_result_file
while read -r line || [[ -n ${line} ]]
do
#echo "$line"
if [[ "${start_flag}" == "" ]];
then
condition=$(echo $line | grep -i "${start_str}")
#echo "condition ============== $condition"
if [[ "$condition" != "" ]] && [[ "$condition" == *"SELECT"* ]];
then
start_flag="1"
step=$[$step+1]
content=${line}
fi
else
if [ $step -eq 2 ]
then
content=${content},${line}
step=$[$step+1]
elif [ $step -eq 3 ]
then
content=${content},${line}
# echo "content===$content"
echo "${content}" >> $kudu_analyze_result_file
start_flag=""
content=""
step=1
else
step=$[$step+1]
fi
fi
done < ${kudu_log_dir}kudu_analyze.log
11.执行start_tpc_import.sh
cd /root/tpc_ds_test/bin && sh start_tpc_import.sh
执行结果查看:/root/tpc_ds_test/result
ext_count_result.csv
1.25张外部表(impala外部表)
2.对所有外部表做COUNT操作的详细结果
import_test_data_result.csv
1.将25张外部表导入到25张kudu内部表的详细结果
kudu_count_result.csv
1.25张内部表(kudu内部表)
2.对所有内部表做COUNT操作的详细结果
kudu_analyze_result
1.对25张kudu表进行 compute status table_name 结果和耗时
执行日志查看:/root/tpc_ds_test/logs
[root@test4 logs]# pwd
/root/tpc_ds_test/logs
[root@test4 logs]# ll
total 4
drwxr-xr-x 2 root root 6 Jun 9 19:30 ext
drwxr-xr-x 2 root root 6 Jun 9 19:30 import
drwxr-xr-x 2 root root 62 Jun 9 19:33 kudu
drwxr-xr-x 2 root root 4096 Jun 9 11:02 query
数据导入成功进行TPC—DS提供的基准查询
前面我们使用了dsqgen工具生成了基准查询SQL语句,SQL文件:/root/tpc_ds_test/query_data/query_0.sql
打开后发现,有很多SQL语法与kudu不兼容。经过手动改造,整理出来61个可用的,其他的暂时先注释。
将SQL内容重新给复制到query_0.sql文件中。
参考👇:
内容过多,参考附件 tpc_ds_test/query_data/query_0.sql
1.准备start_kudu_query.sh执行基准查询脚本。
start_kudu_query.sh在初始化目录时已经创建完成
cd /root/tpc_ds_test/bin && vim start_kudu_query.sh
脚本执行内容:
1.分割query_0.sql的每个SQL。进行单独的查询。
2.收集每个SQL的执行结果和耗时信息。tpc_ds_test/result目录查看结果,tpc_ds_test/logs目录查看执行过程日志。
将下面👇的脚本复制进去
#!/bin/bash
basedir="/root/tpc_ds_test/"
# 分隔查询的sql文件
file_name="${basedir}query_data/query_0.sql"
# 开头字符串1
start_str1="SELECT"
# 开头字符串2
start_str2="WITH"
# 结尾字符串
end_str=";"
# 获取到SQL语句开头的标识
start_flag=""
condition=""
# SQL查询语句目录
query_dir="${basedir}sql/query/"
query_file=""
query_prefix="${query_dir}query_s_"
query_suffix=".sql"
# SQL查询语句日志目录
query_log_dir="${basedir}logs/query/"
# 结果文件
query_result_file="${basedir}result/query_result.csv"
i=0
rm $query_result_file
# 是否清理SQL查询语句目录
clear=true
if [[ "${clear}" == true ]];
then
echo "clear..."
rm -rf "${query_dir}"*
fi
while read -r line || [[ -n ${line} ]]
do
if [[ "${start_flag}" == "" ]];
then
condition=$(echo $line | grep -i "${start_str1}\|${start_str2}")
if [[ "$condition" != "" ]] && [[ $condition != *"--"* ]];
then
# echo "START-- $line"
start_flag="1"
query_file=$query_prefix$i$query_suffix
echo "query_file $query_file"
echo "use tpc_test;" >> $query_file
echo "${line}" >> $query_file
i=$[$i+1]
fi
else
condition=$(echo $line | grep -i "${end_str}")
if [[ "$condition" != "" ]] && [[ $condition != *"--"* ]];
then
# echo "END-- $line"
echo "${line}" >> $query_file
start_flag=""
else
# echo "CONTENT $line"
echo "${line}" >> $query_file
fi
fi
done < $file_name
echo "文件名,SQL,执行时长" >> $query_result_file
for file in $(ls $query_dir)
do
echo "fileName:$file"
sql=$(cat $query_dir$file | xargs echo -n)
sql=${sql//,/,}
# echo "SQL===============${sql}"
log_file=${file%.*}.log
impala-shell -f $query_dir$file > $query_log_dir$log_file 2>&1
# echo "$query_dir$file > $query_log_dir${file%.*}.log 2>&1"
exec_info=$(cat $query_log_dir${file%.*}.log | grep Fetched)
echo "$exec_info"
# exec_info="Fetched"
echo "$query_dir$file,$sql,$exec_info" >> $query_result_file
done
2.执行start_kudu_query.sh
cd /root/tpc_ds_test/bin && sh start_kudu_query.sh
执行结果查看:/root/tpc_ds_test/result
query_result.csv
1.TPC-DS自带的61个复杂的SQL查询。
2.每个SQL的执行详细信息以及查询的结果和耗时。
参考链接:
TPC-DS测试
TPC-DS在大数据中的使用
impala 建表,外部文件数据导入impala中_数据库
TPC-DS性能测试及使用方法_大数据_u011563666的博客-CSDN博客
如何进行TPS-DS测试_数据库_huangmr的专栏-CSDN博客