Hive-脚本形式提交hive查询
接上文Mac-单机Hive安装与测试,文中最后将本地文件load到了hive 表中,下面对该表清洗,以shell脚本形式提交,清洗所用的HSQL脚本。
建立目标表
例如要清洗出的字段如下表
use hive_test;
CREATE EXTERNAL TABLE fact_XXX_log(
time_stamp string COMMENT '时间戳',
dim1 string COMMENT 'dim1',
type string COMMENT '类型',
query_id string COMMENT 'query_id',
error_stack string COMMENT '错误栈',
trace_log string COMMENT 'trace_log',
result_meta string COMMENT '结果meta',
extend_info string COMMENT '扩展信息'
)
COMMENT '清洗日志事实表'
PARTITIONED BY (
`dt` string COMMENT '日'
)
row format delimited fields terminated by '\t'
STORED AS textfile;
清洗SQL脚本
hive_test.sh
#!/bin/sh
echo "---- start exec hive sql"
hive -e "SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE hive_test.fact_XXX_log
PARTITION (dt)
select
xxxxxx 这里省略了
from
hive_test.system_log
;
echo "---- end hive sql"
本地命令行提交SQL脚本
本地执行脚本
/shell$ ls
hive_test.sh
@localhost:~/Documents/book/shell$ sh hive_test.sh
执行日志如下
---- start exec hive sql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding
。。。。
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:
Query ID = xxxx_20180104111556_9381fc76-f0ba-4420-8207-bd4ca3cec77a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1514527790034_0002, Tracking URL = http://localhost:8088/proxy/application_1514527790034_0002/
Kill Command = xxxxxxxp job -kill job_1514527790034_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2018-01-04 11:16:26,465 Stage-1 map = 0%, reduce = 0%
2018-01-04 11:16:33,038 Stage-1 map = 50%, reduce = 0%
2018-01-04 11:16:37,179 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1514527790034_0002
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1514527790034_0003, Tracking URL = http://localhost:8088/proxy/application_1514527790034_0003/
Kill Command = xxxxxxxxadoop-2.7.4/bin/hadoop job -kill job_1514527790034_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2018-01-04 11:17:10,803 Stage-3 map = 0%, reduce = 0%
2018-01-04 11:17:17,681 Stage-3 map = 100%, reduce = 0%
Ended Job = job_1514527790034_0003
Loading data to table hive_test.fact_XXX_log partition (dt=null)
Loaded : 1/1 partitions.
Time taken to load dynamic partitions: 0.2 seconds
Time taken for adding to write entity : 0.001 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 HDFS Read: 30906 HDFS Write: 6438 SUCCESS
Stage-Stage-3: Map: 1 HDFS Read: 10016 HDFS Write: 6240 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 82.54 seconds
---- end hive sql