已知表名:csdata,表中有两个字段:user_id,log_time
问:用户连续登录天数为3的是?。
user_id log_time
111111 2022/12/12 00:00:00
111111 2022/12/13 00:00:00
111111 2022/12/22 00:00:00
111111 2022/12/23 00:00:00
111111 2022/12/24 00:00:00
111111 2022/12/27 00:00:00
111111 2022/12/31 00:00:00
222222 2022/12/02 00:00:00
222222 2022/12/08 00:00:00
222222 2022/12/11 00:00:00
222222 2022/12/14 00:00:00
222222 2022/12/18 00:00:00
222222 2022/12/23 00:00:00
222222 2022/12/27 00:00:00
222222 2022/12/28 00:00:00
222222 2022/12/30 00:00:00
首先将表格数据用python导入数据库:
from sqlalchemy import create_engine
import pandas as pd
conn = create_engine(‘mysql+pymysql://root:root@localhost:3306/test’)
df_1=pd.read_excel(r"C:\Users******\csdata.xlsx")
df_1.to_sql(‘csdata’,con=conn,index=False)
第一步:用row_number 对每个用户进行分组依据登录时间先后排序
第二步:计算辅助列,用登录时间-排序 得到一个辅助列,如果得到的辅助列值相同,那么值相同的这几天就是连续的。
第三步,求出每个用户的所有连续登录天数
第四步,求出所有连续登录天数为3的用户
SELECT
user_id,
log_dt,
consecutive_day
FROM
(
SELECT
user_id,
log_dt,
diff,
COUNT(diff) AS consecutive_day
FROM
(
SELECT
user_id,
log_time,
log_dt,
rn,
date_sub(log_dt, INTERVAL rn DAY) AS diff
FROM
(
SELECT
user_id,
log_time,
substr(log_time, 1, 10) AS log_dt,
ROW_NUMBER () OVER (
PARTITION BY user_id
ORDER BY
substr(log_time, 1, 10)
) AS rn
FROM
csdata
) login_days
) login_days_diff
GROUP BY
user_id,
diff
) consecutive_day
WHERE
consecutive_day = 3
``