hive 统计学生连续出勤天数,剔除周末

统计学生连续出勤天数,过滤周末,来反映学生的出勤情况,下面用代码来实现这个功能。

一、创建测试表,插入模拟数据

create table px_data_test_temp.temp_stu_attendance_tb (
    date_col         string,
    stu_no           string,
    is_attendence    string
);

insert into px_data_test_temp.temp_stu_attendance_tb  
(date_col, stu_no, is_attendence) 
values 
('2019-11-01','STU-00001','上课'),
('2019-11-02','STU-00001','上课'),
('2019-11-03','STU-00001','缺勤'),
('2019-11-04','STU-00001','上课'),
('2019-11-05','STU-00001','上课'),
('2019-11-06','STU-00001','上课'),
('2019-11-07','STU-00001','上课'),
('2019-11-08','STU-00001','上课'),
('2019-11-09','STU-00001','缺勤'),
('2019-11-10','STU-00001','缺勤'),
('2019-11-11','STU-00001','上课'),
('2019-11-12','STU-00001','上课'),
('2019-11-13','STU-00001','上课'),
('2019-11-14','STU-00001','上课'),
('2019-11-15','STU-00001','缺勤'),
('2019-11-16','STU-00001','缺勤'),
('2019-11-17','STU-00001','缺勤'),
('2019-11-18','STU-00001','上课'),
('2019-11-19','STU-00001','上课'),
('2019-11-20','STU-00001','上课'),
('2019-11-01','STU-00002','上课'),
('2019-11-02','STU-00002','缺勤'),
('2019-11-03','STU-00002','缺勤'),
('2019-11-04','STU-00002','上课'),
('2019-11-05','STU-00002','上课'),
('2019-11-06','STU-00002','上课'),
('2019-11-07','STU-00002','上课'),
('2019-11-08','STU-00002','上课'),
('2019-11-09','STU-00002','缺勤'),
('2019-11-10','STU-00002','缺勤'),
('2019-11-11','STU-00002','上课'),
('2019-11-12','STU-00002','上课'),
('2019-11-13','STU-00002','上课'),
('2019-11-14','STU-00002','上课'),
('2019-11-15','STU-00002','上课'),
('2019-11-16','STU-00002','缺勤'),
('2019-11-17','STU-00002','缺勤'),
('2019-11-18','STU-00002','上课'),
('2019-11-19','STU-00002','上课'),
('2019-11-20','STU-00002','上课');

二、获取连续出勤的开始日期

select date_col, stu_no, week_which_day,
        row_number() over(partition by stu_no order by date_col) as rk, 
       date_sub(date_col, (row_number() over(partition by stu_no order by date_col) - 1)) from_day --连续打卡的开始日期
  from
  (
    select date_col, stu_no, is_attendence,
           -- pmod(datediff(date_col, '2012-01-01'), 7) week_which_day --标记周几,周日为0
           dayofweek(date_col) week_which_day 
      from px_data_test_temp.temp_stu_attendance_tb 
     group by date_col, stu_no, is_attendence
  ) t
 --将缺勤的数据过滤,但保留周末的数据(如果过滤周末数据,则每个学生最大连续出勤天数为5)
 where is_attendence = '上课' or week_which_day in (6, 0);

结果:
中间结果表

三、统计学生连续出勤天数,过滤周末

with t1_tb as 
(
select date_col, stu_no, week_which_day,
        row_number() over(partition by stu_no order by date_col) as rk, 
       date_sub(date_col, (row_number() over(partition by stu_no order by date_col) - 1)) from_day --连续打卡的开始日期
  from
  (
    select date_col, stu_no, is_attendence,
           -- pmod(datediff(date_col, '2012-01-01'), 7) week_which_day --标记周几,周日为0
           dayofweek(date_col) week_which_day 
      from px_data_test_temp.temp_stu_attendance_tb 
     group by date_col, stu_no, is_attendence
  ) t
 --将缺勤的数据过滤,但保留周末的数据(如果过滤周末数据,则每个学生最大连续出勤天数为5)
 where is_attendence = '上课' or week_which_day in (6, 0) 
) 
select stu_no, start_date, end_date, continuous_days 
from 
(
    select 
        stu_no, start_date, end_date, continuous_days, 
        row_number() over (partition by stu_no order by continuous_days desc) as rk  
    from 
    (
        select 
            stu_no,
            min(date_col) as start_date,
            max(date_col) as end_date,
            count(1) as continuous_days 
        from t1_tb 
        where week_which_day not in (0,6)   -- 过滤周末数据
        group by stu_no, from_day 
    ) t1
) t2
where rk = 1;

结果:
最终结果

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

雾岛与鲸

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值