Hive数仓ETL过程

Hive数仓ETL过程

  1. Synchronize industrial chain data to HDFS
hadoop distcp hdfs://10.xx.xx.x:8020/aidata/industrial_chain/ hdfs://10.xx.xx.xx:8020/data/aidata/industrial_chain/
  1. Creating tables in ODS
CREATE TABLE IF NOT EXISTS ods.ods_table(
  `content` string)
PARTITIONED BY (dt string, ic string)
row format delimited fields terminated by '\001'
  1. Load data into ODS
LOAD DATA INPATH '/data/path/{yyyyMMdd}/{ic}' 
OVERWRITE INTO TABLE ods.ods_table PARTITION(dt='{yyyyMMdd}',ic='{ic}');
  1. Processing ODS data into DWD
CREATE TABLE IF NOT EXISTS dwd.dwd_table (
id bigint comment "唯一标识",
code string comment "编码",
score string comment "评分"
) COMMENT "dwd" partitioned by (ic string)
row format delimited fields terminated by '\001';

INSERT OVERWRITE TABLE dwd.dwd_table (partition ic)
SELECT
GET_JSON_OBJECT(a.content, '$.source.id') as id,  
REPLACE(GET_JSON_OBJECT(a.content, '$.fields.nodeCode[0]'),'-','') AS code,
GET_JSON_OBJECT(a.content, '$.score') as score,
a.ic
FROM ods.ods_table a 
WHERE a.dt='{yyyyMMdd}'
  1. Shell script
#!/bin/bash

anynowtime="date +'%Y-%m-%d %H:%M:%S'"
NOW="echo [\`$anynowtime\`][PID:$$]"
if [ $1 ]; then
  start_dt="$1"
else
  start_dt=$(date +"%Y%m%d" -d "-1day")
fi
if [ $2 ]; then
  end_dt="$2"
else
  end_dt="$1"
fi

##### 可在脚本开始运行时调用,打印当时的时间戳及PID。
function job_start() {
  echo "$(eval $NOW) job_start"
}

##### 可在脚本执行成功的逻辑分支处调用,打印当时的时间戳及PID。
function job_success() {
  MSG="$*"
  echo "$(eval $NOW) job_success:[$MSG]"
  exit 0
}

##### 可在脚本执行失败的逻辑分支处调用,打印当时的时间戳及PID。
function job_fail() {
  MSG="$*"
  echo "$(eval $NOW) job_fail:[$MSG]"
  exit 1
}

job_start
###### 作业平台中执行脚本成功和失败的标准只取决于脚本最后一条执行语句的返回值
###### 如果返回值为0,则认为此脚本执行成功,如果非0,则认为脚本执行失败
###### 可在此处开始编写您的脚本逻辑代码
function job() {
  for ic in `hadoop fs -ls -C hdfs://10.xx.xx.x:8020/path/$1/output`; do
    echo "当前日期:"$1 "当前分区:"${ic##*/} "aidata路径:"$ic
    hadoop distcp $ic hdfs://10.xx.xx.xx:8020/data/aidata/industrial_chain/$1/${ic##*/}
    sql=$(
    cat <<-EOF
        set hive.execution.engine=mr;

        CREATE TABLE IF NOT EXISTS ods.ods_table(
        content string) PARTITIONED BY (dt string, ic string)
        row format delimited fields terminated by '\001';

        LOAD DATA INPATH '/data/path/$1/${ic##*/}' 
        OVERWRITE INTO TABLE ods.ods_table PARTITION(dt='$1',ic='${ic##*/}');        
EOF
    )
    hive -e "$sql"
    if [ $? == 0 ]; then
      echo "sucess ==>> $1 $ic"
    else
      echo "$sql"
      echo "error ==>> $1 $ic"
      exit 1
    fi
  done
  dwd_sql=$(
      cat <<-EOF
        set hive.execution.engine=mr;
        set hive.exec.dynamic.partition.mode=nonstrict;

        CREATE TABLE IF NOT EXISTS dwd.dwd_table (
        id bigint comment "企业唯一标识",
        code string comment "编码",
        score string comment "评分"
        ) COMMENT "dwd" partitioned by (ic string)
        row format delimited fields terminated by '\001';
        
        INSERT OVERWRITE TABLE dwd.dwd_table partition(ic)
        SELECT
        GET_JSON_OBJECT(a.content, '\$.source.id') as id,  
        REPLACE(GET_JSON_OBJECT(a.content, '\$.fields.nodeCode[0]'),'-','') AS code,        
        GET_JSON_OBJECT(a.content, '\$.score') as score,
        a.ic
        FROM ods.ods_table a 
        WHERE a.dt='$1' GROUP BY a.content, a.ic

EOF
  )
    hive -e "$dwd_sql"
    if [ $? == 0 ]; then
        echo "sucess dwd_sql ==>> $1"
    else
        echo "$dwd_sql"
        echo "error ==>> $1"
        exit 1
    fi
}

while [[ "$start_dt" -le "$end_dt" ]]; do
  echo $start_dt
  job $start_dt
  let start_dt=$(date -d "-1 days ago ${start_dt}" +%Y%m%d)
done

if [ $? == 0 ]; then
  job_success
else
  job_fail
fi
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DuanHao_

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值