linux脚本自动导数据,通过自动化shell脚本,每日定时执行导入hive数据仓库

本帖最后由 pig2 于 2018-4-29 10:31 编辑

每日定时导入hive数据仓库的自动化脚本

创建shell脚本,创建临时表,装载数据,转换到正式的分区表中:

[mw_shl_code=bash,true]#!/bin/sh

# upload logs to hdfs

yesterday=`date --date='1 days ago' +%Y%m%d`

hive -e "

use stage;

create table tracklog_tmp (

dateday string,

datetime string,

ip string ,

cookieid string,

userid string,

logserverip string,

referer string,

requesturl string,

remark1 string,

remark2 string,

alexaflag string,

ua string,

wirelessflag string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';"

hive -e "

use stage;

set hive.enforce.bucketing=true;

set hive.exec.compress.output=true;

set mapred.output.compress=true;

set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;

load data local inpath '/diskg/logs/tracklog_192.168.1.1/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

load data local inpath '/diskg/logs/tracklog_192.168.1.2/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

load data local inpath '/diskg/logs/tracklog_192.168.1.3/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

load data local inpath '/diskg/logs/trackloguc_192.168.1.1/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

load data local inpath '/diskg/logs/trackloguc_192.168.1.2/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

load data local inpath '/diskg/logs/trackloguc_192.168.1.3/${yesterday}/${yesterday}????.dat' overwrite into table tracklog_tmp;

insert into table tracklog PARTITION (day='${yesterday}')  select  *  from tracklog_tmp;

"

hive -e "

use stage;

drop table tracklog_tmp ;"  [/mw_shl_code]

在crontab中加入定时任务

crontab -e

加入如下代码

#import tracklog

25  07 * * * /opt/bin/hive_opt/import_tracklog.sh

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值