需求七：最近连续3周活跃用户数

最新推荐文章于 2023-08-01 14:32:28 发布

阿萨德沐阳

最新推荐文章于 2023-08-01 14:32:28 发布

阅读量421

点赞数

分类专栏： shell 文章标签：大数据

本文链接：https://blog.csdn.net/qq_41508919/article/details/125499491

版权

shell 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

该博客介绍了如何使用Hive进行数据统计，特别是针对最近连续3周活跃用户的计算。首先，定义了DWS层的周活跃用户明细表，然后在ADS层创建导入表并插入2019-02-20周的数据。接着，展示了编写自动化脚本的步骤，包括设置统计日期、执行Hive SQL以及查询结果。最后，讨论了在企业环境中脚本的执行时间通常为每周一凌晨。

摘要由CSDN通过智能技术生成

需求七：最近连续3周活跃用户数

最近3周连续活跃的用户：通常是周一对前3周的数据做统计，该数据一周计算一次。

13.1 DWS层

使用周活明细表dws_uv_detail_wk作为DWS层数据

13.2 ADS层

1）建表语句

hive (gmall)>
drop table if exists ads_continuity_wk_count;
create external table ads_continuity_wk_count( 
    `dt` string COMMENT '统计日期,一般用结束周周日日期,如果每天计算一次,可用当天日期',
    `wk_dt` string COMMENT '持续时间',
    `continuity_count` bigint
) 
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_continuity_wk_count';

2）导入2019-02-20所在周的数据

hive (gmall)>
insert into table ads_continuity_wk_count
select 
     '2019-02-20',
     concat(date_add(next_day('2019-02-20','MO'),-7*3),'_',date_add(next_day('2019-02-20','MO'),-1)),
     count(*)
from 
(
    select mid_id
    from dws_uv_detail_wk
    where wk_dt>=concat(date_add(next_day('2019-02-20','MO'),-7*3),'_',date_add(next_day('2019-02-20','MO'),-7*2-1)) 
    and wk_dt<=concat(date_add(next_day('2019-02-20','MO'),-7),'_',date_add(next_day('2019-02-20','MO'),-1))
    group by mid_id
    having count(*)=3
)t1;

3）查询

hive (gmall)> select * from ads_continuity_wk_count;

13.3 编写脚本

1）在hadoop102的/home/atguigu/bin目录下创建脚本

[atguigu@hadoop102 bin]$ vim ads_continuity_wk_log.sh

在脚本中编写如下内容

#!/bin/bash

if [ -n "$1" ];then
	do_date=$1
else
	do_date=`date -d "-1 day" +%F`
fi

hive=/opt/module/hive/bin/hive
APP=gmall

echo "-----------导入日期$do_date-----------"

sql="
insert into table "$APP".ads_continuity_wk_count
select 
     '$do_date',
     concat(date_add(next_day('$do_date','MO'),-7*3),'_',date_add(next_day('$do_date','MO'),-1)),
     count(*)
from 
(
    select mid_id
    from "$APP".dws_uv_detail_wk
    where wk_dt>=concat(date_add(next_day('$do_date','MO'),-7*3),'_',date_add(next_day('$do_date','MO'),-7*2-1)) 
    and wk_dt<=concat(date_add(next_day('$do_date','MO'),-7),'_',date_add(next_day('$do_date','MO'),-1))
    group by mid_id
    having count(*)=3
)t1;"

$hive -e "$sql"

2）增加脚本执行权限