【电商数仓】数仓搭建之数据应用(application data stpre -- ADS)层(ADS层之用户统计、行为漏斗分析、留存率,七天连续活跃人数,每分钟在线用户数、访客统计,路径分析等)

一 用户统计

该需求为用户综合统计,其中包含若干指标,以下为对每个指标的解释说明。

指标说明对应字段
新增用户数统计新增注册用户人数new_user_count
新增下单用户数统计新增下单用户人数new_order_user_count
下单总金额统计所有订单总额order_final_amount
下单用户数统计下单用户总数order_user_count
未下单用户数统计活跃但未下单用户数no_order_user_count

1 建表语句

DROP TABLE IF EXISTS ads_user_total;
CREATE EXTERNAL TABLE `ads_user_total` (
  `dt` STRING COMMENT '统计日期',
  `new_user_count` BIGINT COMMENT '新注册用户数',
  `new_order_user_count` BIGINT COMMENT '新增下单用户数',
  `order_final_amount` DECIMAL(16,2) COMMENT '下单总金额',
  `order_user_count` BIGINT COMMENT '下单用户数',
  `no_order_user_count` BIGINT COMMENT '未下单用户数(具体指活跃用户中未下单用户)'
) COMMENT '用户统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_total/';

2 数据导入

insert overwrite table ads_user_total
select * from ads_user_total
union
-- 用户统计
select
    '2020-06-14',
    sum(if(login_date_first='2020-06-14',1,0)) new_user_count,
    sum(if(order_date_first='2020-06-14',1,0)) new_order_user_count,
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count,
    sum(if(login_date_last='2020-06-14' and order_final_amount=0,1,0)) no_order_user_count
from dwt_user_topic
where dt='2020-06-14'

二 用户变动统计

该需求包括两个指标,分别为流失用户数和回流用户数,以下为对两个指标的解释说明。

指标说明对应字段
流失用户数之前活跃过的用户,最近一段时间未活跃,就称为流失用户。此处要求统计7日前(只包含7日前当天)活跃,但最近7日未活跃的用户总数。user_churn_count
回流用户数之前的活跃用户,一段时间未活跃(流失),今日又活跃了,就称为回流用户。次数要求统计回流用户总数。new_order_user_count

1 建表语句

DROP TABLE IF EXISTS ads_user_change;
CREATE EXTERNAL TABLE `ads_user_change` (
  `dt` STRING COMMENT '统计日期',
  `user_churn_count` BIGINT COMMENT '流失用户数',
  `user_back_count` BIGINT COMMENT '回流用户数'
) COMMENT '用户变动统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_change/';

2 数据导入

流失用户:末次活跃时间为7日前的用户即为流失用户。

回流用户:末次活跃时间为今日,上次活跃时间在8日前的用户即为回流用户。

insert overwrite table ads_user_change
select * from ads_user_change
union
select
    churn.dt,
    user_churn_count,
    user_back_count
from
(
    select
        '2020-06-14' dt,
        count(*) user_churn_count
    -- 统计流失用户
    from dwt_user_topic
    where dt='2020-06-14'
    and login_date_last=date_add('2020-06-14',-7)
)churn
-- 回流用户
join
(
    select
        '2020-06-14' dt,
        count(*) user_back_count
    from
    (
        select
            user_id,
            login_date_last
        from dwt_user_topic
        where dt='2020-06-14'
        and login_date_last='2020-06-14'
    )t1
    join
    (
        select
            user_id,
            login_date_last login_date_previous
        from dwt_user_topic
        where dt=date_add('2020-06-14',-1)
    )t2
    on t1.user_id=t2.user_id
    where datediff(login_date_last,login_date_previous)>=8
)back
on churn.dt=back.dt;

三 用户行为漏斗分析

漏斗分析是一个数据分析模型,它能够科学反映一个业务过程从起点到终点各阶段用户转化情况。由于其能将各阶段环节都展示出来,故哪个阶段存在问题,就能一目了然。

漏斗图如下:

在这里插入图片描述

该需求要求统计一个完整的购物流程各个阶段的人数。

1 建表语句

DROP TABLE IF EXISTS ads_user_action;
CREATE EXTERNAL TABLE `ads_user_action` (
  `dt` STRING COMMENT '统计日期',
  `home_count` BIGINT COMMENT '浏览首页人数',
  `good_detail_count` BIGINT COMMENT '浏览商品详情页人数',
  `cart_count` BIGINT COMMENT '加入购物车人数',
  `order_count` BIGINT COMMENT '下单人数',
  `payment_count` BIGINT COMMENT '支付人数'
) COMMENT '漏斗分析'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_action/';

2 数据导入

with
tmp_page as
(
    select
        '2020-06-14' dt,
        sum(if(array_contains(pages,'home'),1,0)) home_count,
        sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    from
    (
        --浏览过主页和商品详情页的人
        select
            mid_id,
            collect_set(page.page_id) pages
        from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
        where dt='2020-06-14'
        and page.page_id in ('home','good_detail')
        group by mid_id
    )t1
),
tmp_cop as
(
    -- 统计加购、下单、支付的人数
    select
        '2020-06-14' dt,
        sum(if(cart_last_1d_count>0,1,0)) cart_count,
        sum(if(order_last_1d_count>0,1,0)) order_count,
        sum(if(payment_last_1d_count>0,1,0)) payment_count
    from dwt_user_topic
    where dt='2020-06-14'
)
insert overwrite table ads_user_action
select * from ads_user_action
union
select
    tmp_page.dt,
    home_count,
    good_detail_count,
    cart_count,
    order_count,
    payment_count
from tmp_page
join tmp_cop
on tmp_page.dt=tmp_cop.dt;

四 用户留存率

留存分析一般包含新增留存和活跃留存分析。

新增留存分析是分析某天的新增用户中,有多少人会有后续的活跃行为。活跃留存分析是分析某天的活跃用户中,有多少人会有后续的活跃行为。

留存分析是衡量产品对用户价值高低的重要指标。

此处要求统计每天的1到7日的新增留存率,新增留存率具体是指留存用户数与新增用户数的比值,例如2020-06-14新增100个用户,1日之后(2020-06-15)这100人中有80个人活跃了,留存数则为80,留存率则为80%。

在这里插入图片描述

1 建表语句

DROP TABLE IF EXISTS ads_user_retention;
CREATE EXTERNAL TABLE ads_user_retention (
  `dt` STRING COMMENT '统计日期',
  `create_date` STRING COMMENT '用户新增日期',
  `retention_day` BIGINT COMMENT '截至当前日期留存天数',
  `retention_count` BIGINT COMMENT '留存用户数量',
  `new_user_count` BIGINT COMMENT '新增用户数量',
  `retention_rate` DECIMAL(16,2) COMMENT '留存率'
) COMMENT '用户留存率'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_retention/';

2 数据导入

该需求需要考虑,每天需求计算的留存率是哪几个。

提示:2020-06-15需求计算的留存率是:

2020-06-13的1日留存率

2020-06-12的2日留存率

2020-06-11的3日留存率

2020-06-10的4日留存率

2020-06-09的5日留存率

2020-06-08的6日留存率

2020-06-07的7日留存率

insert overwrite table ads_user_retention
select * from ads_user_retention
union
select
    '2020-06-14',
    login_date_first create_date,
    datediff('2020-06-14',login_date_first) retention_day,
    sum(if(login_date_last='2020-06-14',1,0)) retention_count,
    count(*) new_user_count,
    cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate
from dwt_user_topic
where dt='2020-06-14'
and login_date_first>=date_add('2020-06-14',-7)
and login_date_first<'2020-06-14'
group by login_date_first;

五 七天内连续三天活跃用户数

该需求要求统计最近7天内连续活跃3天及以上的用户数。

1 建表语句

DROP TABLE IF EXISTS ads_user_continuity;
CREATE EXTERNAL TABLE ads_user_continuity (
  `dt` STRING COMMENT '统计日期',
  `user_count` BIGINT COMMENT '最近7天内连续3天活跃用户数'
) COMMENT '最近7天内连续3天活跃用户数'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_continuity/';

2 数据导入

insert overwrite table ads_user_continuity
select * from ads_user_continuity
union
select
    '2020-06-14' dt,
    count(distinct(user_id))
from
(
    select
        user_id
    from
    (
        select
            user_id,
            date_sub(dt,rk) diff
        from
        (
             -- 各个用户连续登录的天数
            select
                user_id,
                dt,
                rank() over (partition by user_id order by dt) rk
            from dws_user_action_daycount
            where dt>=date_add("2020-06-14",-6)
        )t1
    )t2
    group by user_id,diff
    having count(*)>=3
)t3;

3 思路一

select
    count(distinct(user_id))
from
(
    select
        user_id,
        dt,
        lead(dt,2,'1970-01-01') over(partition by user_id order by dt) lead2
    from dws_user_action_daycount
    where dt>=date_add('2020-06-14',-6)
)t1
where datediff(lead2,dt)=2;

若要求间断一天也算作连续,例如(2020-06-10,2020-06-11,2020-06-13也算连续),sql如下

select
    count(distinct(user_id))
from
(
    select
        user_id,
        dt,
        lead(dt,2,'1970-01-01') over(partition by user_id order by dt) lead2
    from dws_user_action_daycount
    where dt>=date_add('2020-06-14',-6)
)t1
where datediff(lead2,dt) in (2,3);

六 每分钟在线用户数

1 建表语句

DROP TABLE IF EXISTS ads_user_online_count_min;
CREATE EXTERNAL TABLE ads_user_online_count_min (
  `dt` STRING COMMENT '统计日期',
  `mins` STRING COMMENT '分钟,要求格式为yyyy-MM-dd HH:mm',
  `user_online_count` BIGINT COMMENT ''
) COMMENT '每分钟在线用户数'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_online_count_min/';

2 数据导入

想要统计每分钟在线用户数,首先需要知道每个用户都在哪分钟在线。

insert overwrite table ads_user_online_count_min
select * from ads_user_online_count_min
union
select
    '2020-06-14' dt,
    mins,
    count(*) cnt
from
(
    select
        user_id,
        from_unixtime(session_start_timestamp+60*pos,'yyyy-MM-dd HH:mm') mins
    from
    (
        select
            session_id,
            user_id,
            unix_timestamp(from_unixtime(session_start_timestamp,'yyyy-MM-dd HH:mm'),'yyyy-MM-dd HH:mm') session_start_timestamp,
            unix_timestamp(from_unixtime(session_end_timestamp,'yyyy-MM-dd HH:mm'),'yyyy-MM-dd HH:mm') session_end_timestamp
        from
        (
            select
                session_id,
                user_id,
                cast(min(page_start_time)/1000 as bigint) session_start_timestamp,
                cast(max(page_end_time)/1000 as bigint) session_end_timestamp
            from
            (
                -- 用户的在线时间,即获取start_time和end_time
                select
                    user_id,
                    ts page_start_time,
                    ts+during_time page_end_time,
                    concat(user_id,'-',last_value(session_start_point,true) over(partition by user_id order by ts)) session_id
                from
                (
                    select
                        user_id,
                        ts,
                        during_time,
                        if(last_page_id is null,ts,null) session_start_point
                    from dwd_page_log
                    where dt='2020-06-14'
                )t1
            )t2
            group by session_id,user_id
        )t3
    )t4
    -- 将跨分钟的浏览数据分成多行,一行一分钟
    -- posexplode:列转行,并添加一个索引列
    -- repeat('x',y):将x重复y次
    lateral view posexplode(split(repeat(",",cast((session_end_timestamp-session_start_timestamp)/60 as int)),",")) tmp as pos,item
)t5
group by mins;

七 访客统计

该需求为访客综合统计,其中包含若干指标,以下为对每个指标的解释说明。若无特殊说明,所有指标的时间维度均为最近一天。

指标说明对应字段
日活数统计访问人数uv_count
页面停留时长统计所有页面访问记录总时长,以秒为单位duration_sec
平均页面停留时长统计每个会话平均停留时长,以秒为单位avg_duration_sec
页面浏览总次数统计所有页面访问记录总数page_count
会话总数统计会话总数sv_count
跳出数统计只浏览一个页面的会话个数bounce_count
跳出率只有一个页面的会话的比例bounce_rate

1 建表语句

DROP TABLE IF EXISTS ads_visit_stats;
CREATE EXTERNAL TABLE ads_visit_stats (
  `dt` STRING COMMENT '统计日期',
  `is_new` STRING COMMENT '新老标识,1:新,0:老',
  `channel` STRING COMMENT '渠道',
  `uv_count` BIGINT COMMENT '日活(访问人数)',
  `duration_sec` BIGINT COMMENT '页面停留总时长',
  `avg_duration_sec` BIGINT COMMENT '一次会话,页面停留平均时长,单位为描述',
  `page_count` BIGINT COMMENT '页面总浏览数',
  `avg_page_count` BIGINT COMMENT '一次会话,页面平均浏览数',
  `sv_count` BIGINT COMMENT '会话次数',
  -- 只包含一个页面的session
  `bounce_count` BIGINT COMMENT '跳出数',
  `bounce_rate` DECIMAL(16,2) COMMENT '跳出率'
) COMMENT '访客统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_visit_stats/';

2 数据导入

思路分析:该需求的关键点为会话的划分,总体实现思路可分为以下几步:

第一步:对所有页面访问记录进行会话的划分。

第二步:统计每个会话的浏览时长和浏览页面数。

第三步:统计上述各指标。

insert overwrite table ads_visit_stats
select * from ads_visit_stats
union
select
    '2020-06-14' dt,
    is_new,
    channel,
    count(distinct(mid_id)) uv_count,
    cast(sum(duration)/1000 as bigint) duration_sec,
    cast(avg(duration)/1000 as bigint) avg_duration_sec,
    sum(page_count) page_count,
    cast(avg(page_count) as bigint) avg_page_count,
    count(*) sv_count,
    sum(if(page_count=1,1,0)) bounce_count,
    cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
from
(
    select
        session_id,
        mid_id,
        is_new,
        channel,
        count(*) page_count,
        sum(during_time) duration
    from
    (
        select
            t1.mid_id,
            channel,
            is_new,
            last_page_id,
            page_id,
            during_time,
            concat(t1.mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by t1.mid_id order by ts)) session_id
        from
        (
            select
                mid_id,
                channel,
                last_page_id,
                page_id,
                during_time,
                dt,
                ts
            from dwd_page_log
            where dt='2020-06-14'
        )t1
        left join
        (
            select
                mid_id,
                if(visit_date_first='2020-06-14','1','0') is_new
            from dwt_visitor_topic
            where dt='2020-06-14'
        )t2
        on t1.mid_id=t2.mid_id
    )t3
    group by session_id,mid_id,is_new,channel
)t6
group by is_new,channel;

八 路径分析

用户路径分析,顾名思义,用户在APP或网站中的访问路径。为了衡量网站优化的效果或营销推广的效果,以及了解用户行为偏好,时常要对访问路径的转换数据进行分析。

该需求要求统计用户的页面访问路径,用户访问路径的可视化通常使用桑基图。如下图所示,该图可真实还原用户的访问路径,包括页面跳转和页面访问次序。

桑基图需要提供每种页面跳转的次数,每个跳转由source/target表示,source指跳转起始页面,target表示跳转到页面。

在这里插入图片描述

1 建表语句

DROP TABLE IF EXISTS ads_page_path;
CREATE EXTERNAL TABLE ads_page_path
(
    `dt` STRING COMMENT '统计日期',
    `source` STRING COMMENT '跳转起始页面ID',
    `target` STRING COMMENT '跳转终到页面ID',
    `path_count` BIGINT COMMENT '跳转次数'
)  COMMENT '页面浏览路径'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_page_path/';

2 数据导入

该需求要统计的就是每钟跳转的次数,故理论上对source/target进行分组count()即可。统计时需注意以下两点:

第一点:桑基图的source不允许为空,但target必须为空。

第二点:桑基图所展示的流程不允许存在环。

第一步从source到target和第二步从source到target不同,分开统计。

insert overwrite table ads_page_path
select * from ads_page_path
union
select
    '2020-06-14',
    source,
    target,
    count(*)
from
(
    select
        concat('step-',step,':',source) source,
        concat('step-',step+1,':',target) target
    from
    (
        select
            page_id source,
            -- 将page_id向上移动一行,没有下一个,给null
            lead(page_id,1,null) over (partition by session_id order by ts) target,
            -- 为了区分步骤
            row_number() over (partition by session_id order by ts) step
        from
        (
            select
                last_page_id,
                page_id,
                ts,
                concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id order by ts)) session_id
            from dwd_page_log
            where dt='2020-06-14'
        )t1
    )t2
)t3
group by source,target;

九 HDFS坏块处理

# 报错原因:强制关闭集群导致
# 关闭安全模式
hdfs dfsadmin -safemode leave
# 检查hdfs文件系统是否坏掉
hdfs fsck /
# 将检查到的坏文件全部删除
hdfs fsck -delete /
# 刷新即可

十 ADS层业务数据导入脚本

1 编写脚本

在/home/hzy/bin目录下创建脚本dwt_to_ads.sh

vim dwt_to_ads.sh

2 脚本内容

#!/bin/bash

APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;then
    do_date=$2
else 
    do_date=`date -d "-1 day" +%F`
fi

ads_activity_stats="
insert overwrite table ${APP}.ads_activity_stats
select * from ${APP}.ads_activity_stats
union
select
    '$do_date' dt,
    t4.activity_id,
    activity_name,
    start_date,
    order_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        activity_id,
        activity_name,
        date_format(start_time,'yyyy-MM-dd') start_date
    from ${APP}.dim_activity_rule_info
    group by activity_id,activity_name,start_time
)t4
left join
(
    select
        activity_id,
        sum(order_count) order_count,
        sum(order_original_amount) order_original_amount,
        sum(order_final_amount) order_final_amount,
        sum(order_reduce_amount) reduce_amount,
        cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
    from ${APP}.dwt_activity_topic
    where dt='$do_date'
    group by activity_id
)t5
on t4.activity_id=t5.activity_id;
"
ads_coupon_stats="
insert overwrite table ${APP}.ads_coupon_stats
select * from ${APP}.ads_coupon_stats
union
select
    '$do_date' dt,
    t1.id,
    coupon_name,
    start_date,
    rule_name,
    get_count,
    order_count,
    expire_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        id,
        coupon_name,
        date_format(start_time,'yyyy-MM-dd') start_date,
        case
            when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
            when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折')
            when coupon_type='3203' then concat('减',benefit_amount,'元')
        end rule_name
    from ${APP}.dim_coupon_info
    where dt='$do_date'
    and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
)t1
left join
(
    select
        coupon_id,
        get_count,
        order_count,
        expire_count,
        order_original_amount,
        order_final_amount,
        order_reduce_amount reduce_amount,
        cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
    from ${APP}.dwt_coupon_topic
    where dt='$do_date'
)t2
on t1.id=t2.coupon_id;
"
ads_order_by_province="
insert overwrite table ${APP}.ads_order_by_province
select * from ${APP}.ads_order_by_province
union
select
    '$do_date' dt,
    province_id,
    province_name,
    area_code,
    iso_code,
    iso_3166_2,
    order_last_1d_count,
    order_last_1d_final_amount
from
(
    select
        province_id,
        order_last_1d_count,
        order_last_1d_final_amount
    from ${APP}.dwt_area_topic
    where dt='$do_date'
)t1
left join
(
    select
        id,
        province_name,
        area_code,
        iso_code,
        iso_3166_2
    from ${APP}.dim_base_province
)t2
on t1.province_id=t2.id;
"
ads_order_spu_stats="
insert overwrite table ${APP}.ads_order_spu_stats
select * from ${APP}.ads_order_spu_stats
union
select
    '$do_date' dt,
    spu_id,
    spu_name,
    tm_id,
    tm_name,
    category3_id,
    category3_name,
    category2_id,
    category2_name,
    category1_id,
    category1_name,
    sum(order_last_1d_count),
    sum(order_last_1d_final_amount)
from
(
    select
        sku_id,
        order_last_1d_count,
        order_last_1d_final_amount
    from ${APP}.dwt_sku_topic
    where dt='$do_date'
)t1
left join
(
    select
        id,
        spu_id,
        spu_name,
        tm_id,
        tm_name,
        category3_id,
        category3_name,
        category2_id,
        category2_name,
        category1_id,
        category1_name
    from ${APP}.dim_sku_info
    where dt='$do_date'
)t2
on t1.sku_id=t2.id
group by spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
"
ads_order_total="
insert overwrite table ${APP}.ads_order_total
select * from ${APP}.ads_order_total
union
select
    '$do_date',
    sum(order_last_1d_count),
    sum(order_last_1d_final_amount) order_final_amount,
    sum(if(order_last_1d_final_amount>0,1,0)) order_user_count
from ${APP}.dwt_user_topic
where dt='$do_date';
"
ads_page_path="
insert overwrite table ${APP}.ads_page_path
select * from ${APP}.ads_page_path
union
select
    '$do_date',
    source,
    target,
    count(*)
from
(
    select
        concat('step-',step,':',source) source,
        concat('step-',step+1,':',target) target
    from
    (
        select
            page_id source,
            lead(page_id,1,null) over (partition by session_id order by ts) target,
            row_number() over (partition by session_id order by ts) step
        from
        (
            select
                last_page_id,
                page_id,
                ts,
                concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id order by ts)) session_id
            from ${APP}.dwd_page_log
            where dt='$do_date'
        )t1
    )t2
)t3
group by source,target;
"
ads_repeat_purchase="
insert overwrite table ${APP}.ads_repeat_purchase
select * from ${APP}.ads_repeat_purchase
union
select
    '$do_date' dt,
    tm_id,
    tm_name,
    cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
from
(
    select
        user_id,
        tm_id,
        tm_name,
        sum(order_count) order_count
    from
    (
        select
            user_id,
            sku_id,
            count(*) order_count
        from ${APP}.dwd_order_detail
        where dt>=date_add('$do_date',-29)
        group by user_id,sku_id
    )t1
    left join
    (
        select
            id,
            tm_id,
            tm_name
        from ${APP}.dim_sku_info
        where dt='$do_date'
    )t2
    on t1.sku_id=t2.id
    group by user_id,tm_id,tm_name
)t3
group by tm_id,tm_name;
"
ads_user_action="
with
tmp_page as
(
    select
        '$do_date' dt,
        sum(if(array_contains(pages,'home'),1,0)) home_count,
        sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    from
    (
        select
            mid_id,
            collect_set(page.page_id) pages
        from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
        where dt='$do_date'
        and page.page_id in ('home','good_detail')
        group by mid_id
    )t1
),
tmp_cop as
(
    select
        '$do_date' dt,
        sum(if(cart_last_1d_count>0,1,0)) cart_count,
        sum(if(order_last_1d_count>0,1,0)) order_count,
        sum(if(payment_last_1d_count>0,1,0)) payment_count
    from ${APP}.dwt_user_topic
    where dt='$do_date'
)
insert overwrite table ${APP}.ads_user_action
select * from ${APP}.ads_user_action
union
select
    tmp_page.dt,
    home_count,
    good_detail_count,
    cart_count,
    order_count,
    payment_count
from tmp_page
join tmp_cop
on tmp_page.dt=tmp_cop.dt;
"
ads_user_change="
insert overwrite table ${APP}.ads_user_change
select * from ${APP}.ads_user_change
union
select
    churn.dt,
    user_churn_count,
    user_back_count
from
(
    select
        '$do_date' dt,
        count(*) user_churn_count
    from ${APP}.dwt_user_topic
    where dt='$do_date'
    and login_date_last=date_add('$do_date',-7)
)churn
join
(
    select
        '$do_date' dt,
        count(*) user_back_count
    from
    (
        select
            user_id,
            login_date_last
        from ${APP}.dwt_user_topic
        where dt='$do_date'
        and login_date_last='$do_date'
    )t1
    join
    (
        select
            user_id,
            login_date_last login_date_previous
        from ${APP}.dwt_user_topic
        where dt=date_add('$do_date',-1)
    )t2
    on t1.user_id=t2.user_id
    where datediff(login_date_last,login_date_previous)>=8
)back
on churn.dt=back.dt;
"
ads_user_retention="
insert overwrite table ${APP}.ads_user_retention
select * from ${APP}.ads_user_retention
union
select
    '$do_date',
    login_date_first create_date,
    datediff('$do_date',login_date_first) retention_day,
    sum(if(login_date_last='$do_date',1,0)) retention_count,
    count(*) new_user_count,
    cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate
from ${APP}.dwt_user_topic
where dt='$do_date'
and login_date_first>=date_add('$do_date',-7)
and login_date_first<'$do_date'
group by login_date_first;
"
ads_user_total="
insert overwrite table ${APP}.ads_user_total
select * from ${APP}.ads_user_total
union
select
    '$do_date',
    sum(if(login_date_first='$do_date',1,0)) new_user_count,
    sum(if(order_date_first='$do_date',1,0)) new_order_user_count,
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count,
    sum(if(login_date_last='$do_date' and order_final_amount=0,1,0)) no_order_user_count
from ${APP}.dwt_user_topic
where dt='$do_date';
"
ads_user_continuity="
insert overwrite table ${APP}.ads_user_continuity
select * from ${APP}.ads_user_continuity
union
select
    '$do_date' dt,
    count(distinct(user_id))
from
(
    select
        user_id
    from
    (
        select
            user_id,
            date_sub(dt,rk) diff
        from
        (
            select
                user_id,
                dt,
                rank() over (partition by user_id order by dt) rk
            from ${APP}.dws_user_action_daycount
            where dt>=date_add('$do_date',-6)
        )t1
    )t2
    group by user_id,diff
    having count(*)>=3
)t3;
"

ads_user_online_count_min="
insert overwrite table ${APP}.ads_user_online_count_min
select * from ${APP}.ads_user_online_count_min
union
select
    '$do_date' dt,
    mins,
    count(*) cnt
from
(
    select
        user_id,
        from_unixtime(session_start_timestamp+60*pos,'yyyy-MM-dd HH:mm') mins
    from
    (
        select
            session_id,
            user_id,
            unix_timestamp(from_unixtime(session_start_timestamp,'yyyy-MM-dd HH:mm'),'yyyy-MM-dd HH:mm') session_start_timestamp,
            unix_timestamp(from_unixtime(session_end_timestamp,'yyyy-MM-dd HH:mm'),'yyyy-MM-dd HH:mm') session_end_timestamp
        from
        (
            select
                session_id,
                user_id,
                cast(min(page_start_time)/1000 as bigint) session_start_timestamp,
                cast(max(page_end_time)/1000 as bigint) session_end_timestamp
            from
            (
                select
                    user_id,
                    ts page_start_time,
                    ts+during_time page_end_time,
                    concat(user_id,'-',last_value(session_start_point,true) over(partition by user_id order by ts)) session_id
                from
                (
                    select
                        user_id,
                        ts,
                        during_time,
                        if(last_page_id is null,ts,null) session_start_point
                    from ${APP}.dwd_page_log
                    where dt='$do_date'
                )t1
            )t2
            group by session_id,user_id
        )t3
    )t4
    lateral view posexplode(split(repeat("0",cast((session_end_timestamp-session_start_timestamp)/60 as int)),"0")) tmp as pos,item
)t5
group by mins;
"

ads_visit_stats="
insert overwrite table ${APP}.ads_visit_stats
select * from ${APP}.ads_visit_stats
union
select
    '$do_date' dt,
    is_new,
    channel,
    count(distinct(mid_id)) uv_count,
    cast(sum(duration)/1000 as bigint) duration_sec,
    cast(avg(duration)/1000 as bigint) avg_duration_sec,
    sum(page_count) page_count,
    cast(avg(page_count) as bigint) avg_page_count,
    count(*) sv_count,
    sum(if(page_count=1,1,0)) bounce_count,
    cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
from
(
    select
        session_id,
        mid_id,
        is_new,
        channel,
        count(*) page_count,
        sum(during_time) duration
    from
    (
        select
            t1.mid_id,
            channel,
            is_new,
            last_page_id,
            page_id,
            during_time,
            concat(t1.mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by t1.mid_id order by ts)) session_id
        from
        (
            select
                mid_id,
                channel,
                last_page_id,
                page_id,
                during_time,
                dt,
                ts
            from ${APP}.dwd_page_log
            where dt='$do_date'
        )t1
        left join
        (
            select
                mid_id,
                if(visit_date_first='$do_date','1','0') is_new
            from ${APP}.dwt_visitor_topic
            where dt='$do_date'
        )t2
        on t1.mid_id=t2.mid_id
    )t3
    group by session_id,mid_id,is_new,channel
)t4
group by is_new,channel;
"
case $1 in
    "ads_activity_stats" )
        hive -e "$ads_activity_stats" 
    ;;
    "ads_coupon_stats" )
        hive -e "$ads_coupon_stats"
    ;;
    "ads_order_by_province" )
        hive -e "$ads_order_by_province" 
    ;;
    "ads_order_spu_stats" )
        hive -e "$ads_order_spu_stats" 
    ;;
    "ads_order_total" )
        hive -e "$ads_order_total" 
    ;;
    "ads_page_path" )
        hive -e "$ads_page_path" 
    ;;
    "ads_repeat_purchase" )
        hive -e "$ads_repeat_purchase" 
    ;;
    "ads_user_action" )
        hive -e "$ads_user_action" 
    ;;
    "ads_user_change" )
        hive -e "$ads_user_change" 
    ;;
    "ads_user_retention" )
        hive -e "$ads_user_retention" 
    ;;
    "ads_user_total" )
        hive -e "$ads_user_total" 
    ;;
    "ads_user_continuity" )
        hive -e "$ads_user_continuity" 
    ;;
    "ads_user_online_count_min" )
        hive -e "$ads_user_online_count_min" 
    ;;
    "ads_visit_stats" )
        hive -e "$ads_visit_stats" 
    ;;
    "all" )
        hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_user_continuity$ads_user_online_count_min$ads_visit_stats"
    ;;
esa

3 使用脚本

chmod 777 dwt_to_ads.sh

dwt_to_ads.sh all 2020-06-14 

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

OneTenTwo76

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值